Version 4 (modified by sdonegan, 11 years ago) (diff)


NERC Discovery Providers Web Service : NERC Revitilisation Improvements August 2010


As part of the SIB Discovery Service revitilisation project it was decided that a more effective data providers portal (DPP) was required. The new DPP area should inherit the same 'look and feel' of the main new discovery service (updated based on MEDIN portal code and run at BODC). As part of this project, the providers area was to be split into 2 sections:

  • Data Providers Portal (DPP): part of the Discovery Service code stack and using the same templates.
  • Data Providers Web Service (DPWS): A SOAP based web service that offers a number of operations to edit user details, initiate metadata harvesting and metadata ingestion into the main discovery database.

The DPP would be run at BODC and would act as a client to the DPWS hosted at CEDA. This wiki page describes the operations and usage of the DPWS.

DPWS location

The current development version of the DPWS is running at (as of 23/08/2010)

DPWS operations

The DPWS implements 9 operations:

  • GetListNames: Get names of controlled values used within the service
  • GetList: Get values within named controlled list
  • DoHarvest: Initiate a metadata harvesting operation on a named provider
  • DoIngest: Initiate a metadata ingestion operation into the discovery database for the named provider
  • GetHarvestHistory: Get information on completed metadata harvests for the named provider
  • GetProcessStatus: Get current process status (started/running/finished) for harvest or ingest processes by unique id.
  • GetIngestHistory: Get information on completed metadata ingests for the named provider
  • GetProviderDetails: Get information held on the configuration for an existing provider
  • DoNewUpdateProvider: Create or update information held on an existing provider

These operation request and response types are summarised below

GetListNames Operation

The DPWS relies on several lists of valid terms which are specific to the functionality of this service. The reason for using these 2 "helper" operations rather than encoding these valid terms as <xs:enumeration> in the schema part of the WSDL, is so that future modifications to the service need not necessarily require the modification of the WSDL (which can be incovenient for clients already developed around a particular release of the WSDL). The GetListNames operation simply returns the names of these lists, which can then be used in a subsequent call to the GetList operation.

The WSDL document defines the GetListNamesRequest message as an empty <GetListNames> element, so the request message should look like this (omitting the SOAP Envelope & Body parent elements):

<getListNames xmlns=""/>

The getListNamesResponse message comprises a <GetListNamesReturn> element, with child elements containing the names of the lists available for inspection:

<ns1:getListNamesResponse xmlns:ns1="">

GetList operation

The contents of each of the lists named by the getListNames operation are accessible by invoking a call to the getList operation, with the name of the list as the single argument, encoded as a getListRequest message, as defined in the WSDL :


<m:GetList xmlns:m="">


<ns1:getListResponse xmlns:ns1="">

DoNewUpdateProvider Operation

This operation will update all information held for that particular provider ID with all information held in the request elements. The operation will return a confirmation status and message. Either the CSWProvider or OAIProvider elements must be provided depending on the type of metadata repository offered by the provider.

Note that in the request do not enter values in ProviderID or in any of the email/ID elements - these are assigned by the DPWS and returned in other operations using the ProviderDetail? type.

Example request setting up a new provider:

GetProviderDetails Operation

DoHarvest Operation

The DoHarvest operation will simply initiate a metadata harvest for a provider using the unique provider ID assigned to that provider at the time of provider details entry into the DPWS database. The DoHarvestRequest requires the ProviderID and optionally any number of !EmailReportID from the provider email details. Note that the !EmailReportID element must contain the associated unique ID for the recipient in the relevant Provider details.

The DoHarvestResponse will return a status value and message as well as a unique process identifier in processID. This assigned identifier is the value that must be used in subsequent !getProcessStatus and DoIngest operations.


A metadata ingest into the Discovery database for a particular provider is acheived by simply supplying the !ProcessID returned in the DoHarvestResponse resulting from a previous DoHarvestRequest for the specified provider. Like the DoHarvestRequest an !EmailReportID can be supplied to which process completion reporting messages can be sent if the user wishes.


The system of metadata harvesting and ingestion can take long periods to complete, especially when large numbers of records are involved and there is heavy traffic on the hardware systems employed at CEDA. As the metadata harvest needs to fully complete before ingestion takes place the GetStatusProcess? operations allow the client to determine when it is possible to continue with further operations on a specific processID. This includes GetHarvestHistory? and GetIngestHistory? operations.

The GetProcessStatusRequest? requires simple the processID as input and the GetProcessStatusResponse? will return

In order to monitor the current status of a certain operation, the DPP will need to recursively perform a GetStatusProcesses? operation until the DPWS returns a status “completed”. At this stage the DPP can issue a GetHarvestHistoryRequest? or GetIngestHistoryRequest?.


The DPP should perform the GetHarvestHistory? operation once a GetStatus? has affirmed the processID completion. The DPP should use this operation to either download and synchronise the local sqlLite db contents or render the information directly on the DPP front end.


The DPP should perform the GetIngestHistory? operation once a GetStatus? has affirmed the processID completion. The DPP should use this operation to either download and synchronise the local sqlLite db contents or render the information directly on the DPP front end.

Operation sequencing

It is required that these operations be called in a certain sequences to allow full operation and rendering of returned information in the DPP. For example to allow a sequence of metadata harvesting, ingestion and seeing the results the following operations must be used in this order (assuming an existing data provider):

List Values