wiki:CombinedUseCase

Version 25 (modified by selatham, 13 years ago) (diff)

--

Combined NDG Use Case with issues

What DataProviders wants from NDG2

  • Increase discovery and usage of RSDAS data, logged per user.
  • Allow visualisation and analysis of RSDAS data using generic tools.
  • Expand NDG-enabled datasets to include long time-series and near-real time data.
  • Provide a data discovery system for NOCS data that is “community supported”
  • Provide external access – especially visualisation - to NOCS data using “community” tools
  • Provide framework for development of (meta)data systems for NOCS that are compatible with NERC data centres
  • Gain access to support in developing tools for (semi-)automating metadata creation & benefit from NDG tools for expected future developments (conversion to ISO standard etc)
  • Provide an incentive for NOCS scientists to submit metadata & data to data management systems
  • Take advantage of NDG developments such as term servers to simplify generation of metadata
  • Possibility to use NDG as an “internal” tool for data discovery / transfer – to BODC!
  • Increased discovery and usage of BODC’s data holdings.
  • Integration of BODC data with data from other sources, e.g. combine CTD profiles from BODC with satellite images from PML and/or model output from NOCS.
  • Visualisation and analysis of BODC data using generic tools.
  • Provide an interoperable metadata catalogue of BODC’s data holdings.
  • Information on data accesses that allows us to: Quantify our performance for NERC’s OPMs Provide feedback on the types of data that people really want to use (logging portal searches might help here).

What Data Users want from NDG

  • Ability to access NOCS data using NDG extraction and visualisation tools
  • Single point access to data at all NERC data centres (esp BODC, BADC, NEODC) using NOCS credentials

Use case of interaction with RSDAS data

  • User discovers RSDAS data on NDG portal, says wow I want to analyse that.
  • User logs in to another DP to get NDG credentials for accessing RSDAS data.
  • User browses around metadata DIF, MOLES, CSML.
  • User visualises time-series of satellite data using GeoSPLAT; compares with other datsets.
  • Peter at PML notices what data this user has accessed, as it appears in the log.
  • Scientist user writes program using Client Package (Python/Java?/IDL) to analyse RSDAS data.
  • User may contact PML for additional access permissions, via RSDAS application form.

Data provider procedure

Describe dataset metadata (MOLES & DIF creation)

  • Decide on scope of each discoverable dataset.
  • Assume we are generating MOLES first then CSML... (discuss).
  • Decide on granularity of DataGranule objects, ie how many CSML per MOLE record? (BADC current datasets need to be discoverable, but may have difficulty creating CSML)
  • Ensure all this is recorded somewhere accessible in DPs back-end metadata. 'somewhere accessible' probably means a database. (Not yet for BADC - big job)
  • Ensure all related metadata are recorded somewhere accessible in DPs back-end metadata, e.g. sensors, units, vocabulary keywords, activities, etc. (Not yet for BADC - big job)
  • Write/adapt software for automating output of MOLES from DPs back-end metadata (DB or wherever).Note this includes all MOLES object types with deployments.
  • Place MOLES records in ndg_B_metadata collection in an eXist db which is accessible to a MOLES Browse web service.
  • Automatically/dynamically generate DIF records from MOLES.
  • Place DIF records in Dlese OAI provider.
  • Review DIF/MOLES accuracy and iterate automatically maintaining the OAI record history. (Dlese current software seems to have problems when records are updated or deleted)

Describe dataset data (CSML creation)

  • Write CSML scanner/templates for new datasets.(Potentially will all have to do this for non-netCDF formats))
  • For future datasets - consider original data formats. Ideally can be converted to suitable format on-the-fly, e.g. netCDF.
  • Generate CSML records dynamically, from netCDF files, database, etc.
  • CSML records will need to use standard names. What if there aren't any available? Big problem for BADC back catalogue
  • Connect CSML to MOLES records via 'S' summary metadata which appears in both. Preferably do this in DPs back-end metadata.
  • Store CSML records somewhere accessible to other Web Services. eXist db?
  • Test CSML accuracy using NDG portal data browser.

NDG Procedure

NDG Discovery

  • New DataProvider tells NDG about their OAI records in NDG compliant format (currently DIF, but should be ISO)
  • NDG sets them up as an automatic harvest.
  • Automatically harvested records are automatically pre-processed to tidy remove OAI style filenames and any namespaces (which cause problems in eXist/XQuery).
  • Pre-processed records are ingested into 'dif' collection in NDG Discovery eXist db. (currently only go into dev/glue. Should we have completely separate production OAI & ingest on superglue? Is an editorial process required? Can we check links work? If not, Discovery portal must cope with any content.)
  • NDG Discovery Web Service and maybe NDG GUI are used to Discover datasets. (NDG Discovery Web Service is broken! Currently using non-WS service to an old db)
  • Indication of access constraints can be seen at this point.(If access constraints have been populated correctly. It is a common problem where people say access_constraints = none, like an empty tag.)

NDG Browse

  • Users select discovered datasets for browsing
  • The Browse Web Service retreives a MOLES stub-B record. Possibly displayed by NDG Browse GUI.(GUI needs extension)
  • User can browse the links to other MOLES objects (abiding by security constraints), back to DIFs, other URLs.
  • User can view or download XML documents.
  • User can select data granules.
  • User history is collected.
  • Where does Browse software need to be installed? and what?

Security and logging

  • Install NDG security software.
  • Generate role mappings with other NDG DP's.
  • Assign DPs users into external NDG roles.
  • Assign datasets to appropriate access role, e.g. any NDG user.
  • Interface NDG security with DPs data browser and authentication system.
  • Ensure NDG access to DPs data is logged: e.g. name, date, data granule id or filename.
  • Test access to DPs data.

Data Extractor (Data Browse)

  • Where do NDG DX services need to be installed?
  • DX gets called with ref to a CSML file. (Is this currently how it happens?)
  • DX accesses security information from CSML (not currently).
  • DX uses NDG Security service to check can have access to data.
  • DX displays details of data allowing selections to be made.
  • DX creates a CSML file for this subset.
  • DX passes this to NDG Data delivery or GeoSplat.
  • Don't think any of this is currently happening like this?

Data Delivery

  • Where do NDG data delivery services need to be installed?
  • Ensure there is a system for real time access to data held in archives. (Not currently at NOCS or BODC)
  • How will bbftp link into existing Data delivery systems? or be used?

Visualisation

  • Where do NDG visualisation services need to be installed?
  • Test delivery of netCDF files from non-netCDF data.
  • Test visualisation of DPs data in GeoSPLAT.
  • How to do visualisation of files other than CF compliant NetCDF?
  • Does CSML have enough metadata for visualisation? Is some more MOLES level needed?

Back to CompleteUseCases.