Version 21 (modified by selatham, 15 years ago) (diff) |
---|
Combined NDG Use Case with issues
What DataProviders wants from NDG2
- Increase discovery and usage of RSDAS data, logged per user.
- Allow visualisation and analysis of RSDAS data using generic tools.
- Expand NDG-enabled datasets to include long time-series and near-real time data.
- Provide a data discovery system for NOCS data that is “community supported”
- Provide external access – especially visualisation - to NOCS data using “community” tools
- Provide framework for development of (meta)data systems for NOCS that are compatible with NERC data centres
- Gain access to support in developing tools for (semi-)automating metadata creation & benefit from NDG tools for expected future developments (conversion to ISO standard etc)
- Provide an incentive for NOCS scientists to submit metadata & data to data management systems
- Take advantage of NDG developments such as term servers to simplify generation of metadata
- Possibility to use NDG as an “internal” tool for data discovery / transfer – to BODC!
What Data Users want from NDG
- Ability to access NOCS data using NDG extraction and visualisation tools
- Single point access to data at all NERC data centres (esp BODC, BADC, NEODC) using NOCS credentials
Use case of interaction with RSDAS data
- User discovers RSDAS data on NDG portal, says wow I want to analyse that.
- User logs in to another DP to get NDG credentials for accessing RSDAS data.
- User browses around metadata DIF, MOLES, CSML.
- User visualises time-series of satellite data using GeoSPLAT; compares with other datsets.
- Peter at PML notices what data this user has accessed, as it appears in the log.
- Scientist user writes program using Client Package (Python/Java?/IDL) to analyse RSDAS data.
- User may contact PML for additional access permissions, via RSDAS application form.
Data provider procedure
Describe dataset metadata (MOLES & DIF creation)
- Decide on scope of each discoverable dataset.
- Assume we are generating MOLES first then CSML... (discuss).
- Decide on granularity of DataGranule objects, ie how many CSML per MOLE record? (BADC current datasets need to be discoverable, but may have difficulty creating CSML)
- Ensure all this is recorded somewhere accessible in DPs back-end metadata. 'somewhere accessible' probably means a database. (Not yet for BADC - big job)
- Ensure all related metadata are recorded somewhere accessible in DPs back-end metadata, e.g. sensors, units, vocabulary keywords, activities, etc. (Not yet for BADC - big job)
- Write/adapt software for automating output of MOLES from DPs back-end metadata (DB or wherever).Note this includes all MOLES object types with deployments.
- Place MOLES records in ndg_B_metadata collection in an eXist db which is accessible to a MOLES Browse web service.
- Automatically/dynamically generate DIF records from MOLES.
- Place DIF records in Dlese OAI provider.
- Review DIF/MOLES accuracy and iterate automatically maintaining the OAI record history. (Dlese current software seems to have problems when records are updated or deleted)
Describe dataset data (CSML creation)
- Write CSML scanner/templates for new datasets.(Potentially will all have to do this for non-netCDF formats))
- For future datasets - consider original data formats. Ideally can be converted to suitable format on-the-fly, e.g. netCDF.
- Generate CSML records dynamically, from netCDF files, database, etc.
- CSML records will need to use standard names. What if there aren't any available? Big problem for BADC back catalogue
- Connect CSML to MOLES records via 'S' summary metadata which appears in both. Preferably do this in DPs back-end metadata.
- Store CSML records somewhere accessible to other Web Services. eXist db?
- Test CSML accuracy using NDG portal data browser.
NDG Procedure
NDG Discovery
- New DataProvider tells NDG about their OAI records in NDG compliant format (currently DIF, but should be ISO)
- NDG sets them up as an automatic harvest.
- Automatically harvested records are automatically pre-processed to tidy remove OAI style filenames and any namespaces (which cause problems in eXist/XQuery).
- Pre-processed records are ingested into 'dif' collection in NDG Discovery eXist db. (currently only go into dev/glue. Should we have completely separate production OAI & ingest on superglue? Is an editorial process required? Can we check links work? If not, Discovery portal must cope with any content.)
- NDG Discovery Web Service and maybe NDG GUI are used to Discover datasets. (NDG Discovery Web Service is broken! Currently using non-WS service to an old db)
- Indication of access constraints can be seen at this point.(If access constraints have been populated correctly. It is a common problem where people say access_constraints = none, like an empty tag.)
NDG Browse
- Users select discovered datasets for browsing
- The Browse Web Service retreives a MOLES stub-B record. Possibly displayed by NDG Browse GUI.(GUI needs extension)
- User can browse the links to other MOLES objects (abiding by security constarints), back to DIFs, other URLs.
- User can view or download XML documents.
- User can select data granules.
- User history is collected.
- Where does Browse software need to be installed? and what?
Security and logging
- Install NDG security software.
- Generate role mappings with other NDG DP's.
- Assign DPs users into external NDG roles.
- Assign datasets to appropriate access role, e.g. any NDG user.
- Interface NDG security with DPs data browser and authentication system.
- Ensure NDG access to DPs data is logged: e.g. name, date, data granule id or filename.
- Test access to DPs data.
Data Extractor (Data Browse)
- Where do NDG DX services need to be installed?
Data Delivery
- Where do NDG data delivery services need to be installed?
- Ensure there is a system for real time access to data held in archives. (Not currently at NOCS or BODC)
Visualisation
- Where do NDG data delivery services need to be installed?
- Test delivery of netCDF files from non-netCDF data.
- Test visualisation of DPs data in GeoSPLAT.
Back to CompleteUseCases.