Changes between Version 1 and Version 2 of Identifiers

05/01/07 15:47:40 (13 years ago)



  • Identifiers

    v1 v2  
    1 === NDG Identifiers ===  
     1== NDG Identifiers ==  
    33On their definition and usage. 
     5(Ideally I'll attach Kev's original opus here at some stage). 
     7=== The Status Quo === 
     9We've agreed that NDG identifiers should look like: 
     14with allowable alternatives of  
     19=== A suggestion === 
     21We also allow, and start to *prefer*: 
     25(note the TWO underscores between each ''token''). 
     27=== How we should use these === 
     29Key entities that NDG knows about are 
     31 * data granule documents (schema CSML) 
     32 * moles documents (schema MOLES-B0) 
     33 * stub-b documents (in MOLES-B1) WE NEED A SCHEMA KEV!!!! 
     34 * discovery documents (could be in DIF, ISO19139, MDIP, or DC). 
     36There are also internal identifiers in the CSML documents and I'm going to come back to those. 
     38==== Creating Metadata Documents ==== 
     40(Bryan's view of how the BADC should do it) 
     42In the beginning there is data. (Let's assume the Activity, ObsStation and DPT exist or are created independently, they're not really germane to this discussion). 
     44In our case (BADC), the data exists on disk. For netcdf data at least, we would expect to be able 
     45to run cdmlscan and produce cdml documents. These describe the data file contents, and we expect 
     46csml2 to accept a cdml document as a storage descriptor.  
     48We can then create csml files by running the parser on the cdml document. 
     50... at this point we need to put an identifier in the csml file. It will be: 
     54(Where do we get the string from? Dominic?) 
     56We leave the datafiles and cdml file ''in the right place on the directory'' 
     58We move the csml file to the badc exist repository (why? because then we can build simple WCS etc stuff 
     59based on a document retrieval interface that is agnostic about the location of the files). We put the csml document in the repository using the name for the document which is the *identifier* above! 
     61BUT, that leaves he problem of linking the csml document to specific cdml documents living on a system, somewhere. 
     63Let's park that for the moment. 
     65Then we can run tools on our csml file, which produce some moles snippets, which we upload into the 
     66badc catalogue into a data entity description which HAS A NEW IDENTIFIER (multiple granules can exist 
     67in one data entity). Let's call this one:  
     71We many then want to create some NumSim documents. We load those into the BADC exist too. We can link to those 
     72from within the MOLES descriptions in the badc catalogue. (these too have different identifiers) 
     74Sue's moles-RDB to existDB script takes that and populates the BADC existDB. (Sue, when will this actually start to happen?).  
     76Sue's code run's a MOLES-to-DIF conversion, which puts DIF files into our OAI repository, where their identifiers 
     81 (i.e the same local_id as in the moles repository). 
     83My browse code comes along and can pull any of these documents directly from the local existDB by their 
     84identifiers alone! It can also transform the MOLES documents into all the formats Kev supports. Kev: the DC identifier in this world OUGHT TO BE  
     88not NEWRANDOMSTRING as it is now. Please fix. 
     90When we OAI harvest any documents, we are now making sure that we ingest them into the NDG existDB with a filename which will be  
     94(where in the case of NDG discovery documents we expect to read this direct from the DIF entry_ID, and in the case of documents from elsewhere, we will create because we know where they came from).  WE are also converting everything into MOLES, so we can extract all discovery documents back in a vareity of formats. 
     96That way, my discovery code, via Matt's do present, can produce a restful, bookmarkable, shopping-cartable link to all discovery documents, which looks like 
     100(if the user wants DC etc). 
     102It also means in the case of NDG documents, I can construct on the fly a browse request by parsing the id alone, 
     103and going from a map of ndg repository identifiers to ndg browse services (yes, I have one of those), so that we have the browse link up as 
     107(actually there is redundancy here that is necessary only because of where we are today, I should be able to just assume that I can do this in effect by substituting the desired schema in the middle). 
     109I can also have a map of csml servers (currently dx servers), which I can point to for data services etc. 
     111However, ideally I don't want a map, what I want you to do is put in your DIF related URL something 
     112that looks like this: 
     115<content_Type> NDG_B_Service </content_Type> 
     119or the equivalent in what replaces DIF. 
     121Note that this is different from NDG1 and NDG-Alpha, but it means that our services are consummable by others 
     122than just ourselves. 
     124Now coming back to the CSML internal identifier issues. That's a matter for the storage descriptor, so I'm going to hand that back to Andrew (for now). 
     187 *