T03 Data Extractor

Given a dataset held at a data provider the data extractor allows the user to extract a subset and instantiate it as a file in a scratch cache at the data provider. This is essentially data browse functionality. Given a specific data model (in this case CSML) that supports a number of feature types, a dataset will consist of a number (one or more) of feature instances defined by that model. It is then possible to define a subset by constraints against the parent dataset (exploiting the properties of that dataset, the feature types and the feature-type relationships) in a way that produces a new data subset that consists of a single feature instance. The main purpose of the data extractor is to provide a file of data which can be delivered to a user, or which can be manipulated by the data manipulation package.

The key components are:

  • Web Service backend (DX-WS)
  • GUI front end.

The GUI front end design needs significant thought, but needs to be based on the existing data extractor and something along the following lines: For the dataset, display a list with headings and dataset concepts. For example:

Data extractor table image

All cells in the table would be linked to things which provide either information (e.g. from a taxonomy service about the phenomenon) or subsetting options, and the subsetting would be built up by the addition of constraints as done in the LAS and existing Data Extractor.

Key Integration Milestones

  • 1.UML Use Cases (5 in all) for scope definition (February 8, 2006)
  • 2.UML of DX Architecture (February 14, 2006) ticket:72
  • 3.DX-WS WSDL description (March 1, 2006) ticket:73
  • 4.DX-WS Observations support (April 1, 2006) ticket:74
  • 5.DX-WS integration with NDG2 Security (May 1, 2006) ticket:75
  • 6.DX-WS 1.0 (Web Service back end) Release (June 1, 2006) ticket:76
  • 7.DX-Web Interface 1.0 (GUI) Release (July 1, 2006) ticket:77
  • 8.NDG Alpha (July 2006) : Functionality expected: DX-WS 1.0 and DX-Web Interface 1.0
  • 9.DX-WS Generation of CSML output (November 1, 2006) ticket:78
  • 10.DX-WS Merge install facility (November 20, 2006) ticket:79
  • 11.DX-WFS Re-casting of DX as a Web Feature Service (December 1, 2006) ticket:80
  • 12.DX non-python command line client (December 20, 2006) ticket:81
  • 13.DX-WS 2.0 and DX-Web Interface 2.0 Release (1 January 1, 2007) ticket:82
  • 14.NDG Beta (January 2007): Functionality expected: DX-WS 2.0 and DX-Web Interface 2.0
  • 15.NDG Final. Functionality expected. DX-WS 2.0 and DX-Web Interface 2.0

Integration Dependencies

  • 1.CSML 2: CSML schema (an application schema of GML).
  • 2.CSML 2: CSML Tools (scanner, parser)
  • 3.Security 12: Software package which deploys an Attribute Authority as a web service.
  • 4.Security 12: Software package that provides a simple certificate authority and myproxy server which can be used to produce * lightweight certificates.
  • 5.Security 12: Web service package to allow controlled access to a resource given the role protecting the resource and user credentials.
  • 6.Security 12: NDG session manager functionality.
  • 7.Security 12: NDG wallet functionality.
  • 8.Security 12: Logging Web Service (database or file based, with web service interface).

Internal Development Stages

  • 1.DX-WS UML Use Cases (5 in all) for scope definition [V0.5, 3 days, AS/DL/AW, 8 February 2006]
    • a.[Definite] Compile use cases based on known user interactions/specifications that will stretch the capability and functionality of the DX. Note: one must be “get 2 files from remote locations and difference them.” Needs underlying Data Services team use cases to build on.
  • 2.UML of DX Architecture [V0.5, 1 day, AS, 14 February 2006]
    • a.[Definite] Produce a UML class diagram of the DX architecture. Based on static version at present (can be updated as project progresses).
  • 3.DX-WS WSDL description [V0.6, 1 week, AS, 1 March 2006]
    • a.[Definite] Supporting CDML variables. E.g. selectDatasets(d1, d2), getVariableOptions().
    • b.[Definite] Supporting CSML features. [Note: users may be happier seeing the term ‘variable’ even when they interacting with a feature, the WSDL interface should be clear and provide both possibilities.
  • 4.DX-WS Observations support [V0.7, 6 weeks, AS/MJ, 1 April 2006]
    • a.[Definite] Re-engineering of internal structures to support arbitrary axes rather than assumed temporal and spatial axes.
    • b.[Definite] Use of CSML PointSeriesFeature? features to design and test selection and sub-setting aimed at non-gridded datasets. This may provide feedback to CSML WP02.
    • c.[Definite] Design of new web interface tools for providing selection of observational data.
    • d.[Definite] Background tooling for comparing or converting between gridded and observational datasets.
  • 5.DX-WS integration with NDG2 Security [V0.8, 1 week, AS/PK, 1 May 2006]
    • a.[Definite] Checking of consistency between security implementations.
    • b.[Definite] Modifications to DX-security to use NDG2 security.
  • 6.DX-WS 1.0 (Web Service back end) Release [V1.0, 2 weeks, AS/DL/SP, 1 June 2006]
    • a.[Definite] Python command line client completed and tested on remote system.
    • b.[Definite] Tidying up of code and testing.
    • c.[Definite] Portability/installation testing.
    • d.[Definite] Installation documentation
    • e.[Definite] Administrator documentation (including adding datasets).
    • f.[Definite] User documentation (command-line/scripting client).
  • 7.DX-Web Interface 1.0 (GUI) Release [V1.0, 2 weeks, AS, 1 July 2006]
    • a.[Definite] Testing and polishing of logging, exception handling and reporting (to user and administrator) so that user is never left wondering what is happening to the query.
    • b.[Definite] Completion and testing of web functionality.
  • 8.NDG Alpha (July 2006) : Functionality expected: DX-WS 1.0 and DX-Web Interface 1.0
  • 9.DX-WS Generation of CSML output [V1.3, 1 week, AS/AW/SP/DL, 1 November 2006]
    • a.[Definite] Output a CSML file (including data or referencing external files) from DX calls.
    • b.[Maybe] Tidying of interface with GeoSPlAT if necessary when CSML used.
  • 10.DX-WS Merge install facility [V1.4, 3 days, AS/PK, 20 November, 2006]
    • a.[Maybe] Following solution from WP13 – implement method of merging local implementation and configuration of DX with new install. This is needed because it is likely to be modified by local administrators.
  • 11.DX-WFS Re-casting of DX as a Web Feature Service [V1.5, 8 weeks, ??, 1 December 2006]
    • a.[Maybe] Investigate OGC web services interface, OWS (SOAP v REST)
    • b.[Maybe] DX can be officially re-branded as a WFS, with the correct WFS interface.
  • 12.DX non-python command line client [V1.6, 4 weeks, HS/PM/AS, 20 December 2006]
    • a.[Maybe] Based on a known user requirement we could build a Java, IDL, MatLab? or Perl client. PML (IDL) and NOCS (MatLab?) are keen to try.
    • b.[Maybe] Consider an FTP like session with simple non-pythonic commands such as “ls ds”, “ls var”, “get” etc.
  • 13.DX-WS 2.0 Release and DX-Web Interface 2.0 [V2.0, 4 weeks, AS, 1 January 2007]
    • a.[Definite] Incorporation of (in-scope) user feedback
    • b.[Definite] Bug-fixes
    • c.[Definite] Documentation update (client(s) and server, admin and user)
    • d.[Definite] Advertising to wider world
  • 14.NDG Beta (January 2007): Functionality expected: DX-WS 2.0 and DX-Web Interface 2.0
  • 15.NDG Final. Functionality expected. DX-WS 2.0 and DX-Web Interface 2.0

DX Wiki pages

Visit the top-level living Data Extractor page to find out what the big issues are.


At present, you can start to get a feel for the DX via the manuals below (or attached). These are currently being written so are not final versions.