Changes between Version 11 and Version 12 of MolesDiscussion


Ignore:
Timestamp:
25/07/06 16:01:24 (13 years ago)
Author:
lawrence
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MolesDiscussion

    v11 v12  
    33'''History''': 
    44 * Initial Version, BNL, reporting a discussion between BNL and KON. Based on [source:TI07-MOLES/trunk/v1Schema/Schemae/ndgmetadata1.2.5.xsd@1299 ndgmetadata1.2.5] 
     5 * Modified by Siva, Dom and Kevin with explicit inline comments. 
     6 * Modified BNL, 25 July, based on followup discussion between BNL and KON on the 24th ... 
    57 
    68==== Issues on the Table ==== 
    79 
    8  1. The CoatHanger 
     10 1. The !CoatHanger 
    911 1. Granules 
    1012 1. Stub-B Schema 
     13 1. ISO19139 
     14 1. UML 
    1115 
    1216The aim of this document is to discuss these issues and identify the particular tickets we need to raise towards solving them as soon as we can. 
     17 
     18==== Actions ==== 
     19 
     20Kev to turn all the issues and tasks in this document into trac tickets, so we can make decisions on the issues, and 
     21get on to the coding. Aiming to have this done by planned meeting on Tuesday.  
    1322 
    1423==== Stub-B ==== 
     
    2938of the major entities (Activity, Data Production Tool, Observation Station, Deployment, Data Entity). The descriptionSection would seem to be a useful adendum to each of these in their own right (possibly instead of making it part of the overall description, since we see this additional information being additional attribute(s) of the entities). 
    3039 
    31 Agreed? Then: ''Ticket Needed:Making a descriptionSection '''part''' of each of the major entities, allowing a stub-b to include this information for each of the first order entities in a natural way.'' 
    32  
    33 (KDO - Not agreed. The description is here to help provide a minimum amount of information where the record is being interpreted outside an envirionment that acknowledges the metadata record types, by putting these standard fragments in a standard and easily accessible place. It also simplifies the schema by having a single place for this section that all metadata records should have.) 
    34  
     40 * BNL: Agreed? Then: ''Ticket Needed:Making a descriptionSection '''part''' of each of the major entities, allowing a stub-b to include this information for each of the first order entities in a natural way.''  
     41   * KDO - Not agreed. The description is here to help provide a minimum amount of information where the record is being interpreted outside an envirionment that acknowledges the metadata record types, by putting these standard fragments in a standard and easily accessible place. It also simplifies the schema by having a single place for this section that all metadata records should have. 
     42     * BNL - ok, the difference of opinion comes down to understanding how MOLES is constructed. In fact, one can think of an Activity as a subclass (specialisation) of !dgMetadata. With that in mind, we see immediately we already have what I wanted ... but it's not quite that simple because we potentially want to add multiple documents. So what I think we want is something like the following (maybe exploiting the online reference type which follows): 
     43[[Image(Moles.ProtoUML.01.jpg)]] 
    3544 
    3645'''Issue:''' The Online Reference type  
     
    4352'''Issue:''' ''Ticket Needed: Provide a suggested mechanism of exploiting xlink to do this.'' (A proposal should be a schema fragment which includes a controlled vocabulary for the attributes of the xlink, recognising that we will be on the bleeding edge here and some future changes in our technology may be necessary). 
    4453 
    45 (KDO - the mechanism I expected to use was to extend the online reference type into a choice between the existing simple reference and an "xlink type structure" (maybe a citation type structure as well?). However, as Bryan points out, anothre schema fragment and associated enumerations or vocabs is required.) 
     54 * KDO - the mechanism I expected to use was to extend the online reference type into a choice between the existing simple reference and an "xlink type structure" (maybe a citation type structure as well?). However, as Bryan points out, another schema fragment and associated enumerations or vocabs is required. 
     55   * BNL - so I think we're good to go on this one! 
    4656 
    4757'''Issue''' NumSim in particular. Here we expect to wait for an ISO19139 compliant version (ticket:284), which will have clear subcomponents targetted for the deployment and data production tools. 
     
    5464the data entity, and the information we put in a data granule. 
    5565 
    56 (KDO question - I've always assumed that there would be a level of summarisation in the parameters presented via the DE, but I'm getting the feeling that this might not be so...) 
     66 * KDO question - I've always assumed that there would be a level of summarisation in the parameters presented via the DE, but I'm getting the feeling that this might not be so... 
     67   * BNL I think the problem is how this is done with respect to the granules. More of this below. 
    5768 
    5869Starting with the data granule: 
     
    6778to that instance, e.g. http://badchost/dX?uri=badc.nerc.ac.uk:CSML:blah) 
    6879 
    69 (KDO - there's some confusion due to history here I think. Given what has been said in the past, my expectation was that the data granule ID was the key needed by the relevant services, so the instance was redundant for the NDG. However, it was intended to provide a hook for data that may be accessed outside the NDG SOA.) 
    70  
     80 * KDO - there's some confusion due to history here I think. Given what has been said in the past, my expectation was that the data granule ID was the key needed by the relevant services, so the instance was redundant for the NDG. However, it was intended to provide a hook for data that may be accessed outside the NDG SOA. 
     81   * BNL - ok, I think we're agreed that the answer is that for NDG it ''is'' a csml id (unadorned with a service binding), and we should use something else for non-NDG data. Which brings me to: 
     82      * The only use case I can think of for non-NDG data in MOLES is a vehicle for migrating one harvested discovery format for another ... in which case we probably do want to put somewhere an option for a URL which binds to the data (i.e. including a service binding) ... we might use that for the other use case which is NDG data for which no data granules exist ... huh? needs more discussion. 
    7183 
    7284Note that the granulecoverage is the spatio-temporal bounding box, it doesn't cover 
     
    8193 * IsOutput variable (boolean). ''BNL can't really see the point of this. KON did explain, but this needs revisiting''. ''Decide: In or out?'' 
    8294     
    83    '''''In''', At BODC, we are considering if IsOutput is True, then that Parameter is visible in data discovery, and is invisible if it is False.''(Siva,BODC). 
     95   ''Siva:'''In''', At BODC, we are considering if IsOutput is True, then that Parameter is visible in data discovery, and is invisible if it is False.'' 
    8496 
    85 (KDO - whoops, looks like something got lost here... The original intent was to differentiate between fixed parameters, eg, data taken at a constant height, and non-fixed, aka measured, parameters, such as the temperature at a particular time at the constant height. Siva's case, I expected to be dealt with by excluding the parameter from the DE parameter summary, and leaving it to be found at data browse time.) 
    86  
     97   * KDO - whoops, looks like something got lost here... The original intent was to differentiate between fixed parameters, eg, data taken at a constant height, and non-fixed, aka measured, parameters, such as the temperature at a particular time at the constant height. Siva's case, I expected to be dealt with by excluding the parameter from the DE parameter summary, and leaving it to be found at data browse time.) 
     98      * BNL: So I think the height of the data parameter is something that belongs in CSML ... and so this should be out, and replaced (for BODC) with something that covers the BODC use case (but exactly what use case supports hiding a parameter from discovery? 
    8799 
    88100 * The next thing is a choice of four items, only one of which should appear for any parameter. Either the value, or the range of values, or an enumeration list of the  value types, or a compound group should appear. ''Yes/No? If so, ticket needed: It needs to be a choice as to whether this thing exists and it needs a name.'' ''Also another ticket: Roy to give us a few practical examples of how the parameter group is intended to work '' 
    89   ''Yes,at BODC we are using the following Strategy.Go for dgRangeDataParameter and check if HighValue=LowValue, in which case we use dgValueDataParameter.The way we get the HighValue and LowValue is,  by opening each Series data file (QXF file) and the min and max value for the required data channel is obtained.Once the limits for each Series have been obtained, the extremes may be determined to give the limits for the dataset.We cannot envisage using dgEnumerationParameter.''(Siva,BODC). 
    90  
    91   '' I am concerned that it's not practical to obtain the High and Low values for a parameter when you are dealing with very (very) large datasets e.g. atmospheric model runs. Not practical in the sense that it would increase the processing time to generate CSML by many orders of magnitude. (Dominic) '' 
     101   * ''Siva: Yes,at BODC we are using the following Strategy.Go for dgRangeDataParameter and check if HighValue=LowValue, in which case we use dgValueDataParameter.The way we get the HighValue and LowValue is,  by opening each Series data file (QXF file) and the min and max value for the required data channel is obtained.Once the limits for each Series have been obtained, the extremes may be determined to give the limits for the dataset.We cannot envisage using dgEnumerationParameter.'' 
     102   * ''Dominic: I am concerned that it's not practical to obtain the High and Low values for a parameter when you are dealing with very (very) large datasets e.g. atmospheric model runs. Not practical in the sense that it would increase the processing time to generate CSML by many orders of magnitude.'' 
     103      * BNL: I don't think we ''have'' to use it ... but I think it would be ''very'' cool from a model use perspective. 
     104   * BNL: so the bottom line is that the schema needs to be modified as suggested. 
    92105 
    93106 * The other elements are rather obvious, but ... 
    94107   * Note that we would expect to use the dgStdParameterMeasured variable to encode both the phenomenon name and the cell bounds (so we get the averaging information  here). ''Can we promote something useful from the CF cell methods? Ticket Needed'' 
    95108 * I suppose we imagine a granule of consisting of multiple phenomena with multiple feature types, but we would expect that any one phenomenon in one granule to have one feature type (''Andrew/Dominic?''). In which case the feature type name and the feature type catalogue from which it is governed should also be encoded per parameter. However, one might argue that the assumption might be violated, and in any case, at this point the user might be pointed to the WFS level. ''It would certainly be simpler, and possibly more useful to generate a list of feature types present in the granule (along with their FTC antecedents).'' ''Yes/No? Ticket Needed?!'' 
    96  
    97 ''I think that assumption (one phenomenon -- one feature type (for a given granule)) is correct. (Dominic)'' 
     109   * Dominic:''I think that assumption (one phenomenon -- one feature type (for a given granule)) is correct.'' 
     110   * BNL: so the ticket is needed, and we should do this. 
    98111 
    99112Now we have this information at the granule level, how much of it should be summarised up at the data entity level by the moles creator? (''Ticket: We would need tools to do this!'') 
     113 
     114BNL: The argument for aggregation is to make it easier to generate the discovery level information which doesn't see the individual granule information. Easier to do at moles creation time than in the xquery for discovery! 
    100115 
    101116The overall material includes the following data summary: 
     
    111126(KDO - ok, at this point I'm going to talk about summarisation: I thought there was a need to actively summarise the data to aid understanding, with the data browse phase dealing with the real detail. Also, this summarisation could take into account the needs of those from other disciplines who may need to access the data.) 
    112127 
     128(BNL - I've contradicted myself here. Either we should summarise the parameters ''and'' the features, or we should 
     129summarise neither ... I think both, to simplify that xquery ...) 
     130 
    113131Now looking at the other two elements in the data entity which are relevant: 
    114132 
     
    119137 
    120138(KDO - earlier versions of the schema had a comment for dgDataObjectType along the lines of "why isn't this just a term from a vocab to identify the "feature type", with answer that '''some''' data entity types might have attributes only of interest to discovery, that would rarely be populated for other types, and just confuse things. Examples are: input data entities/granules for the "derived DEs"; and a notional dgImage which would have details about the camera used and pixel resolution. Hence, restricting the number of types to only those with such attributes (suggestions wanted), and having a list of CSML feature types involved is probably a good way to go) 
     139 
     140(BNL I'm still convinced that dgDataObjectType is covered by the dgGranule content (with feature type added), so it should go). 
     141 
     142==== ISO19139 ==== 
     143 
     144Given we don't have any schema for IOC ISO19139, and the WMO ISO19139 is a tiny extension and no contraction, we should first look at the example documents, and decide how much we think we could get away with by  
     145 * exporting just the same content we have in a DIF, but in ISO19139, (i.e. requiring Kev to construct an appropriate xquery, which could simply be the DIF one minimallly changed so the output is in teh right places for 
     146ISO19139) and 
     147 * importing the WMO via a simple extension to BROWSE (bnl problem) 
     148 
     149==== UML ==== 
     150 
     151It's quite clear that the MOLES data model needs to be in UML. Ideally we'd want to be able to autogenerate the schema via ShapeChange, but that's a ''long'' way away, meanwhile, the docs should make as much as possible clear with UML fragments. 
    121152 
    122153 
     
    139170 
    140171 
    141