Changes between Version 15 and Version 16 of MolesDiscussion


Ignore:
Timestamp:
01/08/06 16:07:23 (13 years ago)
Author:
lawrence
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MolesDiscussion

    v15 v16  
    66 * Modified BNL, 25 July, based on followup discussion between BNL and KON on the 24th ... 
    77 * Modified RKL, KDO, 28 July with inline comments. 
     8 * Modified BNL, 1 August, restructured, clear ticket based actions. This page should no longer be modified, only modify the tickets ... 
    89 
    910==== Issues on the Table ==== 
    1011 
    11  1. The !CoatHanger 
     12 1. The !CoatHanger:  
     13    [[TicketQuery(Keywords~=CoatHanger)]] 
    1214 1. Granules 
     15    [[TicketQuery(Keywords~=GranuleSummary)]] 
    1316 1. Stub-B Schema 
     17    [[TicketQuery(Keywords~=StubB)]] 
    1418 1. ISO19139 
    15  1. UML 
    16  
    17 The aim of this document is to discuss these issues and identify the particular tickets we need to raise towards solving them as soon as we can. 
    18  
    19 ==== Actions ==== 
    20  
    21 Kev to turn all the issues and tasks in this document into trac tickets, so we can make decisions on the issues, and 
    22 get on to the coding. Aiming to have this done by planned meeting on Tuesday.  
     19    [[TicketQuery(Keywords~=ISO19139)]] 
     20 1. General Structure 
     21    [[TicketQuery(Keywords~=MolesGeneral)]] 
     22     
     23The aim of this document is to discuss these issues and identify specific actions. 
    2324 
    2425==== Stub-B ==== 
     
    4041 
    4142 * BNL: Agreed? Then: ''Ticket Needed:Making a descriptionSection '''part''' of each of the major entities, allowing a stub-b to include this information for each of the first order entities in a natural way.''  
    42    * KDO - Not agreed. The description is here to help provide a minimum amount of information where the record is being interpreted outside an envirionment that acknowledges the metadata record types, by putting these standard fragments in a standard and easily accessible place. It also simplifies the schema by having a single place for this section that all metadata records should have. 
    43      * BNL - ok, the difference of opinion comes down to understanding how MOLES is constructed. In fact, one can think of an Activity as a subclass (specialisation) of !dgMetadata. With that in mind, we see immediately we already have what I wanted ... but it's not quite that simple because we potentially want to add multiple documents. So what I think we want is something like the following (maybe exploiting the online reference type which follows): 
     43 * KDO - Not agreed. The description is here to help provide a minimum amount of information where the record is being interpreted outside an envirionment that acknowledges the metadata record types, by putting these standard fragments in a standard and easily accessible place. It also simplifies the schema by having a single place for this section that all metadata records should have. 
     44 * BNL - ok, the difference of opinion comes down to understanding how MOLES is constructed. In fact, one can think of an Activity as a subclass (specialisation) of !dgMetadata. With that in mind, we see immediately we already have what I wanted ... but it's not quite that simple because we potentially want to add multiple documents. So what I think we want is something like the following (maybe exploiting the online reference type which follows): 
    4445[[Image(Moles.ProtoUML.01.jpg)]] 
    4546 
     
    5455 
    5556 * KDO - the mechanism I expected to use was to extend the online reference type into a choice between the existing simple reference and an "xlink type structure" (maybe a citation type structure as well?). However, as Bryan points out, another schema fragment and associated enumerations or vocabs is required. 
    56    * BNL - so I think we're good to go on this one! 
     57 * BNL - so I think we're good to go on this one! 
    5758 
    5859'''Issue''' NumSim in particular. Here we expect to wait for an ISO19139 compliant version (ticket:284), which will have clear subcomponents targetted for the deployment and data production tools. 
     
    6667 
    6768 * KDO question - I've always assumed that there would be a level of summarisation in the parameters presented via the DE, but I'm getting the feeling that this might not be so... 
    68    * BNL I think the problem is how this is done with respect to the granules. More of this below. 
     69 * BNL I think the problem is how this is done with respect to the granules. More of this below. 
    6970 
    7071Starting with the data granule: 
     
    8081 
    8182 * KDO - there's some confusion due to history here I think. Given what has been said in the past, my expectation was that the data granule ID was the key needed by the relevant services, so the instance was redundant for the NDG. However, it was intended to provide a hook for data that may be accessed outside the NDG SOA. 
    82    * BNL - ok, I think we're agreed that the answer is that for NDG it ''is'' a csml id (unadorned with a service binding), and we should use something else for non-NDG data. Which brings me to: 
    83       * The only use case I can think of for non-NDG data in MOLES is a vehicle for migrating one harvested discovery format for another ... in which case we probably do want to put somewhere an option for a URL which binds to the data (i.e. including a service binding) ... we might use that for the other use case which is NDG data for which no data granules exist ... huh? needs more discussion. 
     83 * BNL - ok, I think we're agreed that the answer is that for NDG it ''is'' a csml id (unadorned with a service binding), and we should use something else for non-NDG data. Which brings me to: 
     84   * The only use case I can think of for non-NDG data in MOLES is a vehicle for migrating one harvested discovery format for another ... in which case we probably do want to put somewhere an option for a URL which binds to the data (i.e. including a service binding) ... we might use that for the other use case which is NDG data for which no data granules exist ... huh? needs more discussion. 
     85   * OK, I now understand that this is how instance should be used. However, I've raised an issue ticket on how we populate incoming stuff with this ... ticket:463 
    8486 
    8587Note that the granulecoverage is the spatio-temporal bounding box, it doesn't cover 
     
    9294Looking through this we can see the  
    9395 
    94  * IsOutput variable (boolean). ''BNL can't really see the point of this. KON did explain, but this needs revisiting''. ''Decide: In or out?'' 
    95      
    96    ''Siva:'''In''', At BODC, we are considering if IsOutput is True, then that Parameter is visible in data discovery, and is invisible if it is False.'' 
     96IsOutput variable (boolean).  
     97 * ''BNL can't really see the point of this. KON did explain, but this needs revisiting''. ''Decide: In or out?'' 
     98 * ''Siva:'''In''', At BODC, we are considering if IsOutput is True, then that Parameter is visible in data discovery, and is invisible if it is False.'' 
     99 * KDO - whoops, looks like something got lost here... The original intent was to differentiate between fixed parameters, eg, data taken at a constant height, and non-fixed, aka measured, parameters, such as the temperature at a particular time at the constant height. Siva's case, I expected to be dealt with by excluding the parameter from the DE parameter summary, and leaving it to be found at data browse time.) 
     100 * BNL: So I think the height of the data parameter is something that belongs in CSML ... and so this should be out, and replaced (for BODC) with something that covers the BODC use case (but exactly what use case supports hiding a parameter from discovery? 
     101 * KDO - height? Is this visibility? Anyway, this isn't the point of the flag (new name needed?), the concept behind which has been useful in other areas.  
     102 * ''Roy:'' ''Seems to be a misunderstanding here.  I was using IsOutput to hide co-ordinate channels (date/time, depth, CTD pressure) as I thought that was how MOLES was to be used.  Kicking them out altogether is just as good for me.'' 
     103 * KDO - Good, if understand you aright, then Siva's point now makes sense to me. I'd rather use the word "mark" rather than "hide", but the point is that you might want this parameter there, but you don't want it mistaken for a measured value. Hence I say it stays. However, if Bryan still feels strongly, it could be made optional. This would then require consistent usage within a DP, and if it isn't there then "co-ordinate variables" should be left out. 
    97104 
    98    * KDO - whoops, looks like something got lost here... The original intent was to differentiate between fixed parameters, eg, data taken at a constant height, and non-fixed, aka measured, parameters, such as the temperature at a particular time at the constant height. Siva's case, I expected to be dealt with by excluding the parameter from the DE parameter summary, and leaving it to be found at data browse time.) 
    99       * BNL: So I think the height of the data parameter is something that belongs in CSML ... and so this should be out, and replaced (for BODC) with something that covers the BODC use case (but exactly what use case supports hiding a parameter from discovery? 
     105The next thing is a choice of four items, only one of which should appear for any parameter. Either the value, or the range of values, or an enumeration list of the  value types, or a compound group should appear. ''Yes/No? If so, ticket needed: It needs to be a choice as to whether this thing exists and it needs a name (now in ticket:460).''  
    100106 
    101             * KDO - height? Is this visibility? Anyway, this isn't the point of the flag (new name needed?), the concept behind which has been useful in other areas. 
     107''Also another ticket: Roy to give us a few practical examples of how the parameter group is intended to work '' 
     108 * ''Roy:'' ''The primary reason for this is the way we handle date/time in BODC, which is to carry two parameters (days elapsed since the start of the Gregorian Calendar and time within day), BUT we have now decided that the inclusion of this was down to a misunderstanding (see above) about what was to be done with co-ordinate data channels in MOLES.  The other thing that was in the back of my mind was how we handle data quality information (parameter + flag) but I now see this is more of a CSML issue than a MOLES issue. So, I think parameter groups are dropping off our radar'' 
     109 * KDO - It was also a way to link the GCMD valids to BODC variables in a controlled way, without putting the GCMD terms  into structured keywords. 
     110 * BNL - now recognises this is useful for vectors etc ...  
     111 * ''Siva: Yes,at BODC we are using the following Strategy.Go for dgRangeDataParameter and check if HighValue=LowValue, in which case we use dgValueDataParameter.The way we get the HighValue and LowValue is,  by opening each Series data file (QXF file) and the min and max value for the required data channel is obtained.Once the limits for each Series have been obtained, the extremes may be determined to give the limits for the dataset.We cannot envisage using dgEnumerationParameter.'' 
     112 * ''Dominic: I am concerned that it's not practical to obtain the High and Low values for a parameter when you are dealing with very (very) large datasets e.g. atmospheric model runs. Not practical in the sense that it would increase the processing time to generate CSML by many orders of magnitude.'' 
     113 * BNL: I don't think we ''have'' to use it ... but I think it would be ''very'' cool from a model use perspective. 
    102114 
    103 ''Roy:'' ''Seems to be a misunderstanding here.  I was using IsOutput to hide co-ordinate channels (date/time, depth, CTD pressure) as I thought that was how MOLES was to be used.  Kicking them out altogether is just as good for me.'' 
     115The other elements are rather obvious, but ... 
     116 * Note that we would expect to use the dgStdParameterMeasured variable to encode both the phenomenon name and the cell bounds (so we get the averaging information  here). ''Can we promote something useful from the CF cell methods? Ticket Needed'' 
     117 * ''Roy:''''This worried me a little at first, but the more I think about it, the more I think it might help as exposes the two items of information needed to map CF to a BODC PUV term side by side making them a much easier target!''  
     118 * KDO - so you'll explain to me why this isn't a parameter group Bryan? 
     119 * BNL - because it's part of the dgStandardParameterMeasured ... but this will depend on ticket:464 and ticket:465. 
    104120 
    105     * KDO - Good, if understand you aright, then Siva's point now makes sense to me. I'd rather use the word "mark" rather than "hide", but the point is that you might want this parameter there, but you don't want it mistaken for a measured value. Hence I say it stays. However, if Bryan still feels strongly, it could be made optional. This would then require consistent usage within a DP, and if it isn't there then "co-ordinate variables" should be left out. 
     121I suppose we imagine a granule of consisting of multiple phenomena with multiple feature types, but we would expect that any one phenomenon in one granule to have one feature type (''Andrew/Dominic?''). In which case the feature type name and the feature type catalogue from which it is governed should also be encoded per parameter. However, one might argue that the assumption might be violated, and in any case, at this point the user might be pointed to the WFS level. ''It would certainly be simpler, and possibly more useful to generate a list of feature types present in the granule (along with their FTC antecedents).'' ''Yes/No? Ticket Needed?!'' 
     122 * Dominic:''I think that assumption (one phenomenon -- one feature type (for a given granule)) is correct.'' 
     123 * BNL: so the ticket is needed, and we should do this .. now part of ticket:460 
    106124 
    107  * The next thing is a choice of four items, only one of which should appear for any parameter. Either the value, or the range of values, or an enumeration list of the  value types, or a compound group should appear. ''Yes/No? If so, ticket needed: It needs to be a choice as to whether this thing exists and it needs a name.'' ''Also another ticket: Roy to give us a few practical examples of how the parameter group is intended to work '' 
    108  
    109 ''Roy:'' ''The primary reason for this is the way we handle date/time in BODC, which is to carry two parameters (days elapsed since the start of the Gregorian Calendar and time within day), BUT we have now decided that the inclusion of this was down to a misunderstanding (see above) about what was to be done with co-ordinate data channels in MOLES.  The other thing that was in the back of my mind was how we handle data quality information (parameter + flag) but I now see this is more of a CSML issue than a MOLES issue. So, I think parameter groups are dropping off our radar'' 
    110  
    111  * KDO - It was also a way to link the GCMD valids to BODC variables in a controlled way, without putting the GCMD terms  into structured keywords. 
    112  
    113    * ''Siva: Yes,at BODC we are using the following Strategy.Go for dgRangeDataParameter and check if HighValue=LowValue, in which case we use dgValueDataParameter.The way we get the HighValue and LowValue is,  by opening each Series data file (QXF file) and the min and max value for the required data channel is obtained.Once the limits for each Series have been obtained, the extremes may be determined to give the limits for the dataset.We cannot envisage using dgEnumerationParameter.'' 
    114    * ''Dominic: I am concerned that it's not practical to obtain the High and Low values for a parameter when you are dealing with very (very) large datasets e.g. atmospheric model runs. Not practical in the sense that it would increase the processing time to generate CSML by many orders of magnitude.'' 
    115       * BNL: I don't think we ''have'' to use it ... but I think it would be ''very'' cool from a model use perspective. 
    116    * BNL: so the bottom line is that the schema needs to be modified as suggested. 
    117  
    118  * The other elements are rather obvious, but ... 
    119    * Note that we would expect to use the dgStdParameterMeasured variable to encode both the phenomenon name and the cell bounds (so we get the averaging information  here). ''Can we promote something useful from the CF cell methods? Ticket Needed'' 
    120  
    121 ''Roy:''''This worried me a little at first, but the more I think about it, the more I think it might help as exposes the two items of information needed to map CF to a BODC PUV term side by side making them a much easier target!'' 
    122  
    123  * KDO - so you'll explain to me why this isn't a parameter group Bryan? 
    124  
    125  * I suppose we imagine a granule of consisting of multiple phenomena with multiple feature types, but we would expect that any one phenomenon in one granule to have one feature type (''Andrew/Dominic?''). In which case the feature type name and the feature type catalogue from which it is governed should also be encoded per parameter. However, one might argue that the assumption might be violated, and in any case, at this point the user might be pointed to the WFS level. ''It would certainly be simpler, and possibly more useful to generate a list of feature types present in the granule (along with their FTC antecedents).'' ''Yes/No? Ticket Needed?!'' 
    126    * Dominic:''I think that assumption (one phenomenon -- one feature type (for a given granule)) is correct.'' 
    127    * BNL: so the ticket is needed, and we should do this. 
    128  
    129 Now we have this information at the granule level, how much of it should be summarised up at the data entity level by the moles creator? (''Ticket: We would need tools to do this!'') 
    130  
    131 BNL: The argument for aggregation is to make it easier to generate the discovery level information which doesn't see the individual granule information. Easier to do at moles creation time than in the xquery for discovery! 
     125Now we have this information at the granule level, how much of it should be summarised up at the data entity level by the moles creator? (ticket:466)  
     126 * BNL: The argument for aggregation is to make it easier to generate the discovery level information which doesn't see the individual granule information. Easier to do at moles creation time than in the xquery for discovery! 
    132127 
    133128The overall material includes the following data summary: 
     
    136131 
    137132It is a moot question as to how much of this needs to be replicated from the granule content. ''Tickets needed on some of the following'' 
    138  * BNL would argue that the spatio-temporal coverage should be the *union* of the granule coverages (''need a tool to produce this''). 
     133 * BNL would argue that the spatio-temporal coverage should be the *union* of the granule coverages (ticket:466). 
    139134 * KDO - would other data providers like to comment on what they want to do for their data?  
     135 * The parameter coverage is a bit more complicated, because now we think we could have, for example, temperature monthly means and temperature annual means in the granules. I think the only thing that makes sense is to aggregate the granule parameter summaries. ''In which case why bother? We can parse the granule content. Remove?'' 
     136 * BNL - no, where the granules are self consistent, then a summary would make sense, and when it doesn't, it doesn't. This will be a problem for the stub-B viewer, and the maintainer, but isn't a conceptual problem for the schema, provided the summaries are optional.  
     137 * ''There ought however to be a consolidated lists of feature types present ... as well (ticket:451). 
    140138 
    141  * The parameter coverage is a bit more complicated, because now we think we could have, for example, temperature monthly means and temperature annual means in the granules. I think the only thing that makes sense is to aggregate the granule parameter summaries. ''In which case why bother? We can parse the granule content. Remove?'' 
    142  * ''There ought however to be a consolidated lists of feature types present ... as well. Add?'' 
    143  * The other elements seem appropriate. 
     139The other elements seem appropriate. 
    144140 
    145141(KDO - ok, at this point I'm going to talk about summarisation: I thought there was a need to actively summarise the data to aid understanding, with the data browse phase dealing with the real detail. Also, this summarisation could take into account the needs of those from other disciplines who may need to access the data.) 
    146142 
    147 (BNL - I've contradicted myself here. Either we should summarise the parameters ''and'' the features, or we should 
    148 summarise neither ... I think both, to simplify that xquery ...) 
     143(BNL - I've contradicted myself here. Either we should summarise the parameters ''and'' the features, or we should summarise neither ... I think both, to simplify that xquery ... which is now what I think we have in our tickets). 
    149144 
    150145Now looking at the other two elements in the data entity which are relevant: