wiki:ServiceBinding

Version 10 (modified by lawrence, 12 years ago) (diff)

clarifying the related_url and iso and moles transformation

NDG Service Binding

DIF

Introduction

How should NDG DIF producers make use of the DIF  related_url?

Examples on the web include the following:

Group: Related_URL
   URL_Content_Type: GET DATA > OPENDAP DIRECTORY (DODS) 
   URL: http://ferret.wrc.noaa.gov/cgi-bin/nc/data/coads_climatology.nc
   Group: Description
      The following dataset is available on this DODS Server:
      COADS Global Ocean Climatology SST, Air Temp and Winds
   End_Group
End_Group 
Group: Related_URL
   URL_Content_Type: GET SERVICE > GET WEB MAP SERVICE (WMS)  
   URL: http://www.kgis.scar.org/cgi-bin/kgis_wms?coads_climatology.nc
   Group: Description
       OGC WMS service for SCAR KGIS data 
   End_Group
End_Group 

The key point to note is that the URL_Content_Type should be taken from a  keyword list. Regrettably, this is not a version controlled list, and it appears volatile, we may have to take our own copy and control it properly ... (not all members have full definitions, but some are defined on this  page - which itself is from a previous version of the controlled list!).

There are two classes of problem: what do we want to happen with NDG produced DIFs? That is, what do we expect to produce in OUR MOLES documents, and where they should go in the DIF? The second is what should we do with third party DIFs and mini-MOLES?

NDG

From an NDG point of view, five content types are of significant interest, the first two of which are fairly obvious.

  • VIEW EXTENDED METADATA - which should point to an NDG-B browse instance of the corresponding data entity.
  • GET RELATED DATA SET METADATA (DIF) - which could point to a related dataset (DIF).

The next two are less obvious, we have a choice between

  • GET DATA, and
  • GET SERVICE > GET WEB FEATURE SERVICE (WFS)

I think we should begin by using GET DATA pointing at our proto-WFS as it is built, and change-over to the second when we have a properly functioning WFS. In any case, I think we should have all the CSML granule endpoints to be exposed, which will require populating the description rather carefully. Probably a sentence which defines what we are pointing at, and a sentence which is taken from the CSML granule title.

We may also want to use the

  • GET SERVICE > ACCESS WEB SERVICE configured for the DX, with the same comments as above for granules.

Of slightly lesser interest, we may also wish to use

  • VIEW PROJECT HOME PAGE, and
  • VIEW RELATED INFORMATION, which would provide access to 'Extra' metadata held by the data provider (such as plain language documentation on a document server).

So for example, for one dataset we might have:

<Related_URL>
 <URL_Content_Type> GET DATA </URL_Content_Type>
 <URL> http://badc.rl.ac.uk/ndgWFS?uri=badc.nerc.ac.uk__CSML__granule0123</URL>
 <Description> CSML: The dataset is available at this URL via the NDG WFS. This link
 is granule "Seasonal Mean Model Levels" </Description>
</Related_URL> 
<Related_URL>
 <URL_Content_Type> GET DATA </URL_Content_Type>
 <URL> http://badc.rl.ac.uk/ndgWFS?uri=badc.nerc.ac.uk__CSML__granule0124</URL>
 <Description> CSML: The dataset is available at this URL via the NDG WFS. This link
 is granule "Monthly Mean Model Levels" </Description>
</Related_URL> 
<Related_URL>
 <URL_Content_Type> GET DATA </URL_Content_Type>
 <URL> http://badc.rl.ac.uk/ndgWFS?uri=badc.nerc.ac.uk__CSML__granule0125</URL>
 <Description> CSML: The dataset is available at this URL via the NDG WFS. This link
 is to granule "Daily Data Model Levels" </Description>
</Related_URL>
<Related_URL>
 <URL_Content_Type> GET DATA </URL_Content_Type>
 <URL> http://badc.rl.ac.uk/ndgWFS?uri=badc.nerc.ac.uk__MOLES-B0__datasetA</URL>
 <Description> NDGA: The dataset is available at this URL via the NDG WFS. This link
 is to all data granules within the dataset. </Description>
</Related_URL>
<Related_URL>
 <URL_Content_Type> VIEW EXTENDED METADATA </URL_Content_Type>
 <URL> http://badc.rl.ac.uk/browse?uri=badc.nerc.ac.uk__MOLES-B1__datasetA</URL>
 <Description> NDGB: NDG browse metadata can be used to understand more about the data,
and it's relationship to other datasets </Description>
</Related_URL>

Note that this proposal is suggesting an internal controlled vocabulary for the Description entry, which is invoked if the first word, prior to a colon, is one of NDGA, NDGB, or CSML. Why not use the GCMD URL_Content_Type for this?:: Because we can do it today, without reference to them. Why not just harvest MOLES?:: Because we said we wouldn't in NDG2, and our security model doesn't allow us to do so!

These three signifiers will allow us to know that these are NDG related URLs, which means we can do something a bit more clever with them! For the moment, we will only care in detail about cases where CSML appears; in that case, the first sentence should be preserved into and out of MOLES, and the second should be preserved into and out of MOLES as a granule title (into will only occur for the production of mini-MOLES).

The appropriate entries will appear in MOLES at:

<dgMetadata>
    <dgMetadatRecord>
        <dgMetadataID>
            <repository>badc.nerc.ac.uk</repository>
            <schema>MOLES-B0</schema>
            <identifier>datasetA</identifier>
        </dgMetadataID>
        <dgMetadataDescription>
            <metadataDescriptionID>?Kev</metadataDescriptionID>
            <metadataDescriptionLastUpdated>...</metadataDescriptionLastUpdated>
            <abstract>..stuff.</abstract>
            <descriptionSection>
                <descriptionOnlineReference>
                    <dgSimpleLink>http://badc.rl.ac.uk/browse?uri=badc.nerc.ac.uk__MOLES-B1__datasetA</dgSimpleLink>
                    <dgReferenceClass>
                        <dgValidTerm>VIEW EXTENDED METADATA</dgValidTerm>
                        <dgValidTermID>
                            <ParentListID>GCMD URL Content Type Keywords</ParentListID>
                            <TermID>?Kev</TermID>
                        </dgValidTermID>
                        <Definition> NDGB: NDG browse metadata can be used to understand more about
                            the data, and it's relationship to other datasets</Definition>
                    </dgReferenceClass>
                    <dgReferenceName>Related_URL</dgReferenceName>
                </descriptionOnlineReference>
                <descriptionOnlineReference>
                    <dgSimpleLink>http://badc.rl.ac.uk/browse?uri=badc.nerc.ac.uk__MOLES-B0__datasetA</dgSimpleLink>
                    <dgReferenceClass>
                        <dgValidTerm>GET DATA</dgValidTerm>
                        <dgValidTermID>
                            <ParentListID>GCMD URL Content Type Keywords</ParentListID>
                            <TermID>?Kev</TermID>
                        </dgValidTermID>
                        <Definition> NDGA: The dataset is available at this URL via the NDG WFS.
                            This link is to all data granules within the dataset.</Definition>
                    </dgReferenceClass>
                    <dgReferenceName>Related_URL</dgReferenceName>
                </descriptionOnlineReference>
            </descriptionSection>
        </dgMetadataDescription>
        <dgDataEntity>
            <dgDataSetType/>
            <dgDataGranule>
                <dataModelID>
                    <repository>badc.nerc.ac.uk</repository>
                    <schema>CSML</schema>
                    <identifier>granule0123</identifier>
                </dataModelID>
                <instance>
                    <uri>http://badc.rl.ac.uk/ndgWFS?uri=badc.nerc.ac.uk__CSML__granule0123</uri>
                    <format>CSML</format>
                    <instanceComment> CSML: The dataset is available at this URL via the NDG
                        WFS.This link is granule </instanceComment>
                </instance>
                <dgGranuleSummary>
                    <dgGranuleName> Seasonal Mean Model Levels </dgGranuleName>
                    <dgParameterSummary>Stuff</dgParameterSummary>
                </dgGranuleSummary>
            </dgDataGranule>
        </dgDataEntity>
        <dgDataEntity>
            <dgDataSetType/>
            <dgDataGranule>
                <dataModelID>
                    <repository>badc.nerc.ac.uk</repository>
                    <schema>CSML</schema>
                    <identifier>granule0123</identifier>
                </dataModelID>
                <instance>
                    <uri>http://badc.rl.ac.uk/ndgWFS?uri=badc.nerc.ac.uk__CSML__granule0123</uri>
                    <format>CSML</format>
                    <instanceComment> CSML: The dataset is available at this URL via the NDG
                        WFS.This link is granule </instanceComment>
                </instance>
                <dgGranuleSummary>
                    <dgGranuleName> Monthly Mean Model Levels </dgGranuleName>
                    <dgParameterSummary>Stuff</dgParameterSummary>
                </dgGranuleSummary>
            </dgDataGranule>
        </dgDataEntity>
        <dgDataEntity>
            <dgDataSetType/>
            <dgDataGranule>
                <dataModelID>
                    <repository>badc.nerc.ac.uk</repository>
                    <schema>CSML</schema>
                    <identifier>granule0124</identifier>
                </dataModelID>
                <instance>
                    <uri>http://badc.rl.ac.uk/ndgWFS?uri=badc.nerc.ac.uk__CSML__granule0124</uri>
                    <format>CSML</format>
                    <instanceComment> CSML: The dataset is available at this URL via the NDG
                        WFS.This link is granule </instanceComment>
                </instance>
                <dgGranuleSummary>
                    <dgGranuleName> Daily Data Model Levels </dgGranuleName>
                    <dgParameterSummary>Stuff</dgParameterSummary>
                </dgGranuleSummary>
            </dgDataGranule>
        </dgDataEntity>
    </dgMetadatRecord>
</dgMetadata>

All other related URLs seen in DIFS should be preserved into and out of MOLES. via the dgMetadataDescription as in the BROWSE example above.

Non-NDG Records

We should simply parse all non-NDG records directly into a related URL, and parse them back out again preserving their content identically.

ISO19139

Then the question is: how should we do this in ISO19139?

In terms of the content model (ISO19115), the relevant pieces are:

  • DIF's entryID maps onto MD_Metadata/datasetURI (not really relevant to ServiceBinding, included for completeness)
  • All metadata records map onto between 0 and 1 MD_Distribution entities.
    • Which has 0 and many MD_DigitalTransferOptions
      • Which has 0 to many CI_OnlineResources
        • Which consists of a compulsory linkage and optional protocol, applicationProfile,name,description,function
          • of the latter using the CI_OnlineFunctionCode default could be download, information,offlineAccess
            • defined as "instructions for transferring data", "information about " (but note that ISO19139 allows one to use your own controlled vocab here).
    • We may want to make use of the MD_Format
      • Which has compulsory name and version characterstrings plus an optional specification
  • We could try and use the MD_PortrayalCatalogueReference which points to a CI_Citation, and thus could point to an CI_OnlineResource which we could bind appropriately for visualisation.
  • Further, all metadata records have MD_Identification elements, and these can include
    • Aggregations of MD_ServiceIdentification records (which takes us to ISO19119, and the OGC profile of ISO19115+ISO19119 for CSW2.0). However, it appears that the intention of using these is when the metadata record actually describes a service, not for providing a related URL which describes a service pointing

at the particular metadata item of interest. (As an aside, I don't think MD_ServiceIdentification actually exists, it's replaced by SV_ServiceIdentification in practice).

The ISO/OGC way of doing the service binding is to have a registry where one polls the registry for datasets of interest, then uses *registry* associations to find the services which act upon the data, rather than rely on entries within the records. We're clearly not doing this in advance of satisfactory registry implementations. We could implement similar functionality within the discovery client (my code) by implementing service metadata, and exploiting what we know about the underlying features, but I don't think this is tenable for NDG2.

This does present us with a bit of a problem in terms of interoperability between incoming DIF records and outgoing ISO19139 versions of them. I think we should define this as NOT ALLOWED (I can implement this within the discovery code). On the other hand incoming ISO19139 records ought to be renderable as DIF (albeit in a lossy manner), and so we will try and support that).

Finally, in terms of export from MOLES proper, we might not attempt to export related URLs in general, but we ought to be able to support the three cases above:

  • NDGA via the MD_DigitalTransferOptions, and
  • NDGB via the MD_DigitalTransferOptions with
  • CSML via the MD_PortrayalCatalogueReference.

In all three cases we need to exploit CI_OnlineResource, and we could probably live with the vanilla CI_OnlineFunctionCodes (download,information,download) respectively ... but if we do that, we've somewhat obscured the semantics. I'd rather take advantage of the ability to have our own controlled vocab in CI_OnlineFunctionCode, and have the dif keyword vocab in there. Once we do that, we also have recovered the ability to losslessly take DIF related_url's through into MOLES and out into ISO19139, and vice-versa (provided the ISO19139 records have a vocab we recognise here).

Attachments

  • Draft Service Binding.xml Download (4.9 KB) - added by lawrence 12 years ago. This is the MOLES xml used in the example, available as a file to ease editing by using an xml editor