Version 26 (modified by selatham, 15 years ago) (diff)


Combined NDG Use Case with issues

What DataProviders wants from NDG2

  • Increase discovery and usage of RSDAS data, logged per user.
  • Allow visualisation and analysis of RSDAS data using generic tools.
  • Expand NDG-enabled datasets to include long time-series and near-real time data.
  • Provide a data discovery system for NOCS data that is “community supported”
  • Provide external access – especially visualisation - to NOCS data using “community” tools
  • Provide framework for development of (meta)data systems for NOCS that are compatible with NERC data centres
  • Gain access to support in developing tools for (semi-)automating metadata creation & benefit from NDG tools for expected future developments (conversion to ISO standard etc)
  • Provide an incentive for NOCS scientists to submit metadata & data to data management systems
  • Take advantage of NDG developments such as term servers to simplify generation of metadata
  • Possibility to use NDG as an “internal” tool for data discovery / transfer – to BODC!
  • Increased discovery and usage of BODC’s data holdings.
  • Integration of BODC data with data from other sources, e.g. combine CTD profiles from BODC with satellite images from PML and/or model output from NOCS.
  • Visualisation and analysis of BODC data using generic tools.
  • Provide an interoperable metadata catalogue of BODC’s data holdings.
  • Information on data accesses that allows us to:
    • Quantify our performance for NERC’s OPMs
    • Provide feedback on the types of data that people really want to use (logging portal searches might help here).

What Data Users want from NDG

  • Ability to access NOCS data using NDG extraction and visualisation tools
  • Single point access to data at all NERC data centres (esp BODC, BADC, NEODC) using NOCS credentials

Use case of interaction with RSDAS data

  • User discovers RSDAS data on NDG portal, says wow I want to analyse that.
  • User logs in to another DP to get NDG credentials for accessing RSDAS data.
  • User browses around metadata DIF, MOLES, CSML.
  • User visualises time-series of satellite data using GeoSPLAT; compares with other datsets.
  • Peter at PML notices what data this user has accessed, as it appears in the log.
  • Scientist user writes program using Client Package (Python/Java?/IDL) to analyse RSDAS data.
  • User may contact PML for additional access permissions, via RSDAS application form.

Data provider procedure

Describe dataset metadata (MOLES & DIF creation)

  • Decide on scope of each discoverable dataset.
  • Assume we are generating MOLES first then CSML... (discuss).
  • Decide on granularity of DataGranule objects, ie how many CSML per MOLE record? (BADC current datasets need to be discoverable, but may have difficulty creating CSML)
  • Ensure all this is recorded somewhere accessible in DPs back-end metadata. 'somewhere accessible' probably means a database. (Not yet for BADC - big job)
  • Ensure all related metadata are recorded somewhere accessible in DPs back-end metadata, e.g. sensors, units, vocabulary keywords, activities, etc. (Not yet for BADC - big job)
  • Write/adapt software for automating output of MOLES from DPs back-end metadata (DB or wherever).Note this includes all MOLES object types with deployments.
  • Place MOLES records in ndg_B_metadata collection in an eXist db which is accessible to a MOLES Browse web service.
  • Automatically/dynamically generate DIF records from MOLES.
  • Place DIF records in Dlese OAI provider.
  • Review DIF/MOLES accuracy and iterate automatically maintaining the OAI record history. (Dlese current software seems to have problems when records are updated or deleted)

Describe dataset data (CSML creation)

  • Write CSML scanner/templates for new datasets.(Potentially will all have to do this for non-netCDF formats))
  • For future datasets - consider original data formats. Ideally can be converted to suitable format on-the-fly, e.g. netCDF.
  • Generate CSML records dynamically, from netCDF files, database, etc.
  • CSML records will need to use standard names. What if there aren't any available? Big problem for BADC back catalogue
  • Connect CSML to MOLES records via 'S' summary metadata which appears in both. Preferably do this in DPs back-end metadata.
  • Store CSML records somewhere accessible to other Web Services. eXist db?
  • Test CSML accuracy using NDG portal data browser.

NDG Procedure

NDG Discovery

  • New DataProvider tells NDG about their OAI records in NDG compliant format (currently DIF, but should be ISO)
  • NDG sets them up as an automatic harvest.
  • Automatically harvested records are automatically pre-processed to tidy remove OAI style filenames and any namespaces (which cause problems in eXist/XQuery).
  • Pre-processed records are ingested into 'dif' collection in NDG Discovery eXist db. (currently only go into dev/glue. Should we have completely separate production OAI & ingest on superglue? Is an editorial process required? Can we check links work? If not, Discovery portal must cope with any content.)
  • NDG Discovery Web Service and maybe NDG GUI are used to Discover datasets. (NDG Discovery Web Service is broken! Currently using non-WS service to an old db)
  • Indication of access constraints can be seen at this point.(If access constraints have been populated correctly. It is a common problem where people say access_constraints = none, like an empty tag.)

NDG Browse

  • Users select discovered datasets for browsing
  • The Browse Web Service retreives a MOLES stub-B record. Possibly displayed by NDG Browse GUI.(GUI needs extension)
  • User can browse the links to other MOLES objects (abiding by security constraints), back to DIFs, other URLs.
  • User can view or download XML documents.
  • User can select data granules.
  • User history is collected.
  • Where does Browse software need to be installed? and what?

Security and logging

  • Install NDG security software.
  • Generate role mappings with other NDG DP's.
  • Assign DPs users into external NDG roles.
  • Assign datasets to appropriate access role, e.g. any NDG user.
  • Interface NDG security with DPs data browser and authentication system.
  • Ensure NDG access to DPs data is logged: e.g. name, date, data granule id or filename.
  • Test access to DPs data.

Data Extractor (Data Browse)

  • Where do NDG DX services need to be installed?
  • DX gets called with ref to a CSML file. (Is this currently how it happens?)
  • DX accesses security information from CSML (not currently).
  • DX uses NDG Security service to check can have access to data.
  • DX displays details of data allowing selections to be made.
  • DX creates a CSML file for this subset.
  • DX passes this to NDG Data delivery or GeoSplat.
  • Don't think any of this is currently happening like this?

Data Delivery

  • Where do NDG data delivery services need to be installed?
  • Ensure there is a system for real time access to data held in archives. (Not currently at NOCS or BODC)
  • How will bbftp link into existing Data delivery systems? or be used?


  • Where do NDG visualisation services need to be installed?
  • Test delivery of netCDF files from non-netCDF data.
  • Test visualisation of DPs data in GeoSPLAT.
  • How to do visualisation of files other than CF compliant NetCDF?
  • Does CSML have enough metadata for visualisation? Is some more MOLES level needed?

Back to CompleteUseCases.