Version 8 (modified by mpritcha, 11 years ago) (diff)


NERC DataGrid Discovery Web Service

The NERC DataGrid (NDG) Discovery Web Service provides a search interface to metadata records harvested from collaborating data providers and is the backend server to which the NERC Data Discovery Service is a client.


The Discovery Web Service is a presentation-free web service which acts as a search engine on top of the NDG Discovery metadata catalogue. This catalogue is dynamically populated by the harvesting of metadata (by the  CEDA group at RAL) from a number of collaborating data providers, who make their metadata available in one of a number of supported formats.

The search capability provided by the service enables full-text and spatio-temporal searches of catalogued metadata records and returns search results a defined XML structure, enabling search clients to be constructed by interested parties for their own purposes. The NERC Data Discovery Service is one such client, as is the  Environmental Data Portal, among other examples.



Consumers may access the discovery service via SOAP. Client implementations should be generated from the WSDL at the following URIs: (ndg3beta deployment : latest stable development version) (ndg "live" deployment : production version)

XML Data Types

The XML documents used as request and response documents for each of the service operations (methods) are defined in the <xsd:schema> section of the WSDL document. The structure of each of these documents is discussed as part of the operation/method descriptions below.

Discovery Service Operations

The discovery service implements 4 operations, namely:

  • getListNames
  • getList
  • doSearch
  • doPresent

getListNames operation

The discovery web service relies on several lists of valid terms which are specific to the functionality of this service. The reason for using these 2 "helper" operations rather than encoding these valid terms as <xs:enumeration> in the schema part of the WSDL, is so that future modifications to the service need not necessarily require the modification of the WSDL (which can be incovenient for clients already developed around a particular release of the WSDL). The getListNames operation simply returns the names of these lists, which can then be used in a subsequent call to the getList operation.

The WSDL document defines the getListNamesRequest message as an empty <getListNames> element, so the request message should look like this (omitting the SOAP Envelope & Body parent elements):

<m:getListNames xmlns:m="urn:DiscoveryServiceAPI"/>

The getListNamesResponse message comprises a <getListNamesReturn> element, with child elements containing the names of the lists available for inspection:

<getListNamesReturn xmlns="urn:DiscoveryServiceAPI">

getList operation

The contents of each of the lists named by the getListNames operation are accessible by invoking a call to the getList operation, with the name of the list as the single argument, encoded as a getListRequest message, as defined in the WSDL :


<getList xmlns="urn:DiscoveryServiceAPI">


<getListReturn xmlns="urn:DiscoveryServiceAPI">
    <list name="presentFormatList">

An explanation of the presentFormatList list is given later, in the context of the doPresent operation.

doSearch operation

The doSearch operation performs a search against the NDG discovery database. Queries to this database are formulated from the doSearchRequest message, and forwarded to the database via private methods (i.e. the consumer of the web service is not able directly to interact with the database).

Although outside the scope of the Discovery web service itself, it is worth explaining the structure of the NDG Discovery database which is searched by the service. This is populated from records harvested via  OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) from collaborating data providers. Records are currently harvested in GCMD DIF format, and are tagged at ingest time with one or more "scope" keywords (listed in the scopeList list available from the getList operation). These enable the search to be restricted to particular communities, namely NERC, NERC_DDC (Designated Data Centres) and MDIP (Marine Data Information Partnership). Limited quality control on ingested records is also applied at ingest time, and it is the responsibility of the data provider to ensure that metadata records are provided to sufficient quality to enable them to be visible in the system.

The doSearchRequest message is shown in schema form in fig X.

Choice of search type: <term> and <termType>

The only mandatory elements are <term> and <termType>, as used in the example "Quick Start", above. By specifying the <termType>, a choice is made as to which of 3 variants of a full-text search should be invoked. This element should be populated with a valid value from the termTypeList list accessible via the getList operation. At present, these are:


A full-text search is applied to the whole discovery metadata record


A full-text search is applied only to those sections of the discovery metadata record relating to authorship of the dataset


A full-text search is applied only to the parameter listing section of the discovery metadata record. <term> should be populated with the search term, which can be a string of one or more words and wildcard characters. The service is currently configured to execute searches by attempting to match XML documents (in the discovery database) where ALL of the components of the search term are matched (as opposed to ANY). In this way, increasingly specific searches can be used to refine the search results. Searches are case-insensitive. Examples of fullText search terms are:

Matches records with the word "temperature" in any node of a document
sea surface temperature
Matches documents having the words "sea", "surface" AND "temperature" (in any order)
Matches documents containing the string "neodc", even if embedded within a larger string.

Paging : <start> and <howMany>

The optional elements <start> and <howMany> control which records from the result set should be returned (although the total number of hits is always returned as a number to aid with paging in clients). If <start> is omitted, the default value used is 1 (i.e. the first record). If <howMany> is omitted, the default number of records returned is 30.

Ordering: <orderBy> and <orderByDirection>

Ordering of the result set can be requested by setting <orderBy> to one of the valid values listed in the orderByFieldList accesible via the getList operation. Currently these are:

Ranking metric based on relevance of match to search term (metric derived by postgres text ranking function).
The start date of the date range given for the temporal coverage of the metadata record. Records with no start date defined are treated as if their start date is later than that last record with a start date defined. [Note : this is a known bug currently causing records with no start date to be treated as if their start date is AFTER that of the last record with a start date. With orderByDirection set to "ascending" this is less of a problem, but at present, with orderByDirection set to "descending", these "incomplete" records are the first in the results list. See ticket ???]
The name of the data centre supplying the metadata record. In the case of records supplied in DIF format, this is the Data_Centre/Short_Name field. In the case of other metadata formats, the most appropriate equivalent field is used as this index (e.g. "" for MDIP format)

In addition, the direction of ordering (ascending or descending) can be specified. If omitted, the default direction is ascending.

Scope of search: <scope>

The optional <scope> element can be used to restrict the search to onr or more of the supported NDG Data Provider Groups, defined in NDG controlled vocabulary Currently supported values from this vocabulary are these are given in the the scopeList list accessible via the getList operation. Currently these are:

Marine Data Information Partnership (organisation now renamed MEDIN)
NERC Designated Data Centres
NERC (General)
Data Portals Project Provider

If <scope> is omitted, the search is not restricted in this way.

Spatial searching : <spatialOperator> and <boundingBox>

Full-text, author or parameter searches, as described above, may optionally be combined with a further restriction that the spatial coverage described in the metadata records match, according to the specified <spatialOperator>, the specified spatial <boundingBox>. <spatialOperator> may be populated with any of the values from the spatialOperatorList accessible via the getList operation. Currently, supported values are:

overlaps (default)

If <spatialOperator> is omitted, but a valid <boundingBox> is supplied, the default operator applied is overlaps. Values for <limitNorth>, <limitSouth>, <limitEast> and <limitWest> should be given in decimal degrees latitude and longitude. <limitNorth> and <limitSouth> must be in the range -90.0 to +90.0, with <limitNorth> greater than <limitSouth>. <limitWest> and <limitEast> must be in the range -180.0 to 180.0 and <limitEast> should be greater than <limitWest>. Bounding boxes that span the -180 degree meridian, or the poles, are not currently supported.

Spatial searches (as a further restriction of "term" searches) are currently implemented by obtaining a resultset from the term search, obtaining a result set from the spatial search, then returning the intersection of the two result sets.

Temporal searching : <DateRange?>

Full-text, author or parameter searches my optionally be combined with a further restriction that the temporal coverage ovelaps the specified <DateRange?>. Both <DateRangeStart?> and <DateRangeEnd?> must be specified and must be valid dates of the form YYYY-MM-DD. TODO: it is planned to implement a choice of <temporalOperator> in a similar manner to <spatialOperator>.

Search results

The doSearchResponse message is defined in the WSDL as shown below:

The <doSearchReturn> element contains the following top-level elements:

true if successful AND number of hits > 0, false otherwise (designed so that a client need only proceed to parse the rest of the message if results were successfully returned)
Textual information regarding success / failure / errors
reserved for future use
TOTAL number of hits returned
parent element for array of <document> elements containing returned document IDs

A typical search result was shown in the "Quick Start" section. A result where no hits were returned is shown below

<doSearchReturn xmlns="urn:DiscoveryServiceAPI">
	<statusMessage>Search was successful but generated no results.</statusMessage>

doPresent operation

The doPresent operation provides a means of retrieving (presenting) one or more XML documents from the database. The doPresentRequest message is defined as follows:

One or more <document> elements should each contain the names of a document (in the form returned in the doSearchReturn message) to be retrieved. The optional <format> element should be populated with one of the supported format names as listed by the presentFormatList accessible via the getList operation. All documents returned by a single invocation of the doPresent operation are returned in the same format, i.e. the choice of presentFormat applies to the doPresent request and not individual documents. Currently-supported formats are:

original Documents are returned unaltered, in the format in which they were harvested (via OAI-PMH) from the data provider.

Dublin Core format
GCMD DIF format (version ??)
Metadata format used by the Marine Data and Information Partnership
ISO19115 (Geographic Information: Metadata) encoded as ISO19139 XML

For all formats except original, the following action is taken prior to returning the document:

  • Check if the document exists in the discovery database in the requested format, and if so, return it unaltered
  • Apply a conversion XQuery to create a new document in that format on-the-fly

doPresent response

The doPresentResponse message is defined in the WSDL as follows:

The <doPresentReturn> element contains the following top-level elements:

true if there are any documents returned in the payload, false otherwise.
Textual information regarding success / failure / errors.
If some documents have been successfully returned, a <documents> element is present and will contain a child <document> element for each document retrieved. In the case where some but not all documents are successfully returned, the <documents> element will contain populated <document> elements for the successfully-retrieved documents, but an empty <document> element for those where retrieval failed. If NO documents are successfully returned, however, then the <status> is set to false and no <documents> element is included in the doPresentResponse message.

The <document> element, if present and populated, contains the retrieved document as an encapsulated string representation of the XML. Depending on the client used to display the payload document, it either appears contained within a <![CDATA[ ... ]]> construct, or as XML with the opening angle brackets "<" escaped as "&lt;". Most XML parsers should successfully parse the string to reconstruct the XML document, but it is returned in this form to avoid namespace issues.

The following request / response sequence shows a successful doPresent operation:


<m:doPresent xmlns:m="urn:DiscoveryServiceAPI">


<doPresentReturn xmlns="urn:DiscoveryServiceAPI">
        <document>&lt;DIF xmlns="" xmlns:xsi="">&lt;Entry_ID>;/Entry_ID> (...) &lt;/DIF></document>
        <document>&lt;DIF xmlns="" xmlns:xsi="">&lt;Entry_ID>;/Entry_ID> (...) &lt;/DIF></document>
        <document>&lt;DIF xmlns="" xmlns:xsi="">&lt;Entry_ID>;/Entry_ID> (...) &lt;/DIF></document>
        <document>&lt;DIF xmlns="" xmlns:xsi="">&lt;Entry_ID>;/EntryID> (...) &lt;/DIF></document>

Term Lists

Supported Metadata Formats