Changes between Version 1 and Version 2 of Identifiers


Ignore:
Timestamp:
05/01/07 15:47:40 (13 years ago)
Author:
lawrence
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Identifiers

    v1 v2  
    1 === NDG Identifiers ===  
     1== NDG Identifiers ==  
    22 
    33On their definition and usage. 
     4 
     5(Ideally I'll attach Kev's original opus here at some stage). 
     6 
     7=== The Status Quo === 
     8 
     9We've agreed that NDG identifiers should look like: 
     10 
     11{{{ 
     12repository:schema:local_id 
     13}}} 
     14with allowable alternatives of  
     15{{{ 
     16respository/schema/local_id 
     17}}} 
     18 
     19=== A suggestion === 
     20 
     21We also allow, and start to *prefer*: 
     22{{{ 
     23respository__schema__local_id 
     24}}} 
     25(note the TWO underscores between each ''token''). 
     26 
     27=== How we should use these === 
     28 
     29Key entities that NDG knows about are 
     30 
     31 * data granule documents (schema CSML) 
     32 * moles documents (schema MOLES-B0) 
     33 * stub-b documents (in MOLES-B1) WE NEED A SCHEMA KEV!!!! 
     34 * discovery documents (could be in DIF, ISO19139, MDIP, or DC). 
     35 
     36There are also internal identifiers in the CSML documents and I'm going to come back to those. 
     37 
     38==== Creating Metadata Documents ==== 
     39 
     40(Bryan's view of how the BADC should do it) 
     41 
     42In the beginning there is data. (Let's assume the Activity, ObsStation and DPT exist or are created independently, they're not really germane to this discussion). 
     43 
     44In our case (BADC), the data exists on disk. For netcdf data at least, we would expect to be able 
     45to run cdmlscan and produce cdml documents. These describe the data file contents, and we expect 
     46csml2 to accept a cdml document as a storage descriptor.  
     47 
     48We can then create csml files by running the parser on the cdml document. 
     49 
     50... at this point we need to put an identifier in the csml file. It will be: 
     51{{{ 
     52badc.nerc.ac.uk__csml__SomeRandomString 
     53}}} 
     54(Where do we get the string from? Dominic?) 
     55 
     56We leave the datafiles and cdml file ''in the right place on the directory'' 
     57 
     58We move the csml file to the badc exist repository (why? because then we can build simple WCS etc stuff 
     59based on a document retrieval interface that is agnostic about the location of the files). We put the csml document in the repository using the name for the document which is the *identifier* above! 
     60 
     61BUT, that leaves he problem of linking the csml document to specific cdml documents living on a system, somewhere. 
     62 
     63Let's park that for the moment. 
     64 
     65Then we can run tools on our csml file, which produce some moles snippets, which we upload into the 
     66badc catalogue into a data entity description which HAS A NEW IDENTIFIER (multiple granules can exist 
     67in one data entity). Let's call this one:  
     68{{{  
     69badc.nerc.ac.uk__MOLES-B0__NEWRANDOMSTRING. 
     70}}} 
     71We many then want to create some NumSim documents. We load those into the BADC exist too. We can link to those 
     72from within the MOLES descriptions in the badc catalogue. (these too have different identifiers) 
     73 
     74Sue's moles-RDB to existDB script takes that and populates the BADC existDB. (Sue, when will this actually start to happen?).  
     75 
     76Sue's code run's a MOLES-to-DIF conversion, which puts DIF files into our OAI repository, where their identifiers 
     77are  
     78{{{ 
     79badc.nerc.ac.uk__DIF__NEWRANDOMSTRING 
     80}}} 
     81 (i.e the same local_id as in the moles repository). 
     82 
     83My browse code comes along and can pull any of these documents directly from the local existDB by their 
     84identifiers alone! It can also transform the MOLES documents into all the formats Kev supports. Kev: the DC identifier in this world OUGHT TO BE  
     85{{{ 
     86badc.nerc.ac.uk_DC_NEWRANDOMSTRING 
     87}}}  
     88not NEWRANDOMSTRING as it is now. Please fix. 
     89 
     90When we OAI harvest any documents, we are now making sure that we ingest them into the NDG existDB with a filename which will be  
     91{{{ 
     92respository__schema__identifier 
     93}}} 
     94(where in the case of NDG discovery documents we expect to read this direct from the DIF entry_ID, and in the case of documents from elsewhere, we will create because we know where they came from).  WE are also converting everything into MOLES, so we can extract all discovery documents back in a vareity of formats. 
     95 
     96That way, my discovery code, via Matt's do present, can produce a restful, bookmarkable, shopping-cartable link to all discovery documents, which looks like 
     97{{{ 
     98http://glue.badc.rl.ac.uk/retrieve?uri=badc.nerc.ac.uk_DC_NEWRANDOMSTRING 
     99}}} 
     100(if the user wants DC etc). 
     101 
     102It also means in the case of NDG documents, I can construct on the fly a browse request by parsing the id alone, 
     103and going from a map of ndg repository identifiers to ndg browse services (yes, I have one of those), so that we have the browse link up as 
     104{{{ 
     105http://localbrowsehostforrepository/retrieve?uri=badc.nerc.ac.uk_MOLES-B0_NEWRANDOMSTRING/format=MOLES-B1 
     106}}} 
     107(actually there is redundancy here that is necessary only because of where we are today, I should be able to just assume that I can do this in effect by substituting the desired schema in the middle). 
     108 
     109I can also have a map of csml servers (currently dx servers), which I can point to for data services etc. 
     110 
     111However, ideally I don't want a map, what I want you to do is put in your DIF related URL something 
     112that looks like this: 
     113 
     114{{{ 
     115<content_Type> NDG_B_Service </content_Type> 
     116<url>http://localbrowsehostforrepository/browse?uri=badc.nerc.ac.uk_MOLES-B0_NEWRANDOMSTRING/format=MOLES-B1</url> 
     117}}} 
     118 
     119or the equivalent in what replaces DIF. 
     120 
     121Note that this is different from NDG1 and NDG-Alpha, but it means that our services are consummable by others 
     122than just ourselves. 
     123 
     124Now coming back to the CSML internal identifier issues. That's a matter for the storage descriptor, so I'm going to hand that back to Andrew (for now). 
     125 
     126 
     127 
     128 
     129 
     130 
     131 
     132 
     133 
     134 
     135 
     136 
     137 
     138 
     139 
     140 
     141 
     142 
     143 
     144 
     145 
     146 
     147 
     148 
     149 
     150 
     151 
     152 
     153 
     154 
     155 
     156 
     157 
     158 
     159 
     160 
     161 
     162 
     163 
     164 
     165 
     166 
     167 
     168 
     169 
     170 
     171 
     172 
     173 
     174 
     175 
     176 
     177 
     178 
     179 
     180 
     181 
     182 
     183 
     184 
     185 
     186 
     187 *  
     188 
     189