wiki:DPWS_API_NOTES

NERC Discovery Providers Web Service : NERC Revitilisation Improvements March 2011

Introduction

As part of the SIB Discovery Service revitilisation project it was decided that a more effective data providers portal (DPP) was required. The new DPP area should inherit the same 'look and feel' of the main new discovery service (updated based on MEDIN portal code and run at BODC). As part of this project, the providers area was to be split into 2 sections:

  • Data Providers Portal (DPP): part of the Discovery Service code stack and using the same templates.
  • Data Providers Web Service (DPWS): A SOAP based web service that offers a number of operations to edit user details, initiate metadata harvesting and metadata ingestion into the main discovery database.

The DPP would be run at BODC and would act as a client to the DPWS hosted at CEDA. This wiki page describes the operations and usage of the DPWS.

DPWS locations

The DPWS actually (14 March 2011) is running both in a  development and in a  prodution environments. Please refer to this WSDL for the latest operations format.

DPWS operations

The DPWS implements the following operations:

  • GetListNames: Get names of controlled values used within the service
  • GetList: Get values within named controlled list
  • DoHarvest: Initiate a metadata harvesting operation on a named provider
  • DoIngest: Initiate a metadata ingestion operation into the discovery database for the named provider
  • GetHarvestHistory: Get information on completed metadata harvests for the named provider
  • GetStatusProcess: Get current process status (started/running/finished) for harvest or ingest processes by unique id.
  • GetIngestHistory: Get information on completed metadata ingests for the named provider
  • GetProviderDetails: Get information held on the configuration for an existing provider
  • getProviderStatistic: Returns the number of records ingested per provider
  • doNewUpdateProvider: Create or update information held on an existing provider
  • deleteProvider: Removes one provider from the list of the available ones
  • addTimer: Creates a per provider periodic harvest/ingest process
  • deleteTimer: Deletes a previously created timer

These operation request and response types are summarised below

GetListNames Operation

The DPWS relies on several lists of valid terms which are specific to the functionality of this service. The reason for using these 2 "helper" operations rather than encoding these valid terms as <xs:enumeration> in the schema part of the WSDL, is so that future modifications to the service need not necessarily require the modification of the WSDL (which can be incovenient for clients already developed around a particular release of the WSDL). The GetListNames operation simply returns the names of these lists, which can then be used in a subsequent call to the GetList operation.

The WSDL document defines the GetListNamesRequest message as an empty <GetListNames> element, so the request message should look like this (omitting the SOAP Envelope & Body parent elements):

<getListNames xmlns="http://ejb.revitalization.services.ndg/"/>

The getListNamesResponse message comprises a <GetListNamesReturn> element, with child elements containing the names of the lists available for inspection:

<ns1:getListNamesResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:ListNames>
                    <ns1:listItem>harvester-provider-type</ns1:listItem>
                    <ns1:listItem>metadata-format</ns1:listItem>
                    <ns1:listItem>harvest-operation-type</ns1:listItem>
                </ns1:ListNames>
            </return>
        </ns1:getListNamesResponse>

GetList operation

The contents of each of the lists named by the getListNames operation are accessible by invoking a call to the getList operation, with the name of the list as the single argument, encoded as a getListRequest message, as defined in the WSDL :

Request:

<m:GetList xmlns:m="http://medin.discovery.services.ndg/schema">
      <m:listName>PresentFormatList</m:listName>
</m:GetList>

Response:

<ns1:getListResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:ListNames>
                    <ns1:listItem>DIF_9.4</ns1:listItem>
                    <ns1:listItem>ANY</ns1:listItem>
                    <ns1:listItem>dif</ns1:listItem>
                    <ns1:listItem>oai_dc</ns1:listItem>
                    <ns1:listItem>MEDIN_2.3</ns1:listItem>
                </ns1:ListNames>
            </return>
        </ns1:getListResponse>

doNewUpdateProvider Operation

This operation will update all information held for that particular provider ID with all information held in the request elements. The operation will return a confirmation status and message. Either the CSWProvider or OAIProvider elements must be provided depending on the type of metadata repository offered by the provider.

Note that in the request do not enter values in ProviderID or in any of the email/ID elements - these are assigned by the DPWS and returned in other operations using the ProviderDetail type. The ProviderContacts element is optional - use this for additional email contacts at the data provider. In subsequent operations the email ID may be specified so the DPWS API can send emails to these contacts when harvest or ingest operations have completed. The ProviderAdminEmail element is mandatory and should be the email address of the main provider contact who has administrative control over this provider entry (and who should be the only provider user allowed to undertake this particular operation). The monitor element should be set to True of the admin contact needs notifying of every operation, False if not.

Example request setting up a new provider:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <doNewUpdateProvider xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <DoNewUpdateProvider xmlns="http://ejb.revitalization.services.ndg/">
                <OAIProvider>
                <splitBySet>false</splitBySet>
                <Format>dif</Format>
                </OAIProvider>
                    <ProviderCommon>
                    <ProviderURL>http://badc.nerc.ac.uk/badc_oai/provider</ProviderURL>
                        <ProviderName>badc</ProviderName>
                        <ProviderContacts>
                            <Email>igglepiggle@stfc.ac.uk</Email>
                            <Name>Dr Iggle Piggle</Name>
                        </ProviderContacts>
                        <ProviderAdminEmail>
                            <EmailContact>
                                <Email>upsy@nightgarden.land</Email>
                                <Name>Ms Upsy Daisy</Name>
                            </EmailContact>
                            <Monitor>True</Monitor>
                        </ProviderAdminEmail>
                    </ProviderCommon>
                </DoNewUpdateProvider>
            </request>
        </doNewUpdateProvider>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The response type is show below

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:doNewUpdateProviderResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:Provider>
                    <ns1:OAIProvider>
                        <ns1:splitBySet>true</ns1:splitBySet>
                        <ns1:Format>dif</ns1:Format>
                    </ns1:OAIProvider>
                    <ns1:ProviderCommon>
                        <ns1:ProviderURL>http://badc.nerc.ac.uk/badc_oai/provider</ns1:ProviderURL>
                        <ns1:ProviderName>badc sambd</ns1:ProviderName>
                        <ns1:ProviderContacts>
                            <ns1:Email>iscoobydoo@scary.net</ns1:Email>
                            <ns1:Name>Scooby Doo</ns1:Name>
                            <ns1:ID>195</ns1:ID>
                        </ns1:ProviderContacts>
                        <ns1:ProviderContacts>
                            <ns1:Email>upsy@nightgarden.land</ns1:Email>
                            <ns1:Name>Ms Upsy Daisy</ns1:Name>
                            <ns1:ID>196</ns1:ID>
                        </ns1:ProviderContacts>
                    </ns1:ProviderCommon>
                    <ns1:ProviderID>110</ns1:ProviderID>
                </ns1:Provider>
            </return>
        </ns1:doNewUpdateProviderResponse>
    </env:Body>
</env:Envelope>

The response should return the provider ID assigned to this provider.

In order to update details on an existing provider this operation should also be used, but the existing !ProviderID element must be filled with the assigned !ProviderID value. Any values present in the request will be used to update the provider details i.e. :

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <doNewUpdateProvider xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <DoNewUpdateProvider xmlns="http://ejb.revitalization.services.ndg/">
                <OAIProvider>
                <splitBySet>false</splitBySet>
                <Format>dif</Format>
                </OAIProvider>
                    <ProviderCommon>
                    <ProviderURL>http://badc.nerc.ac.uk/badc_oai/provider</ProviderURL>
                        <ProviderName>badc</ProviderName>
                        <ProviderContacts>
                            <Email>iscoobydoo@scary.net</Email>
                            <Name>Scooby Doo</Name>
                        </ProviderContacts>
                        <ProviderAdminEmail>
                            <EmailContact>
                                <Email>upsy@nightgarden.land</Email>
                                <Name>Ms Upsy Daisy</Name>
                            </EmailContact>
                            <Monitor>True</Monitor>
                        </ProviderAdminEmail>
                    </ProviderCommon>
                    <ProviderID>110</ProviderID>
                </DoNewUpdateProvider>
            </request>
        </doNewUpdateProvider>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

getProviderDetails Operation

The getProviderDetails operation should be used to extract information on existing provider details in the DPWS database. This operation can be used to return information on ALL providers in the DPWS, or just a single provider.

This operation should be used in the first instance to extract providerIDs. This can be done by calling the operation with NO !ProviderID element:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <getProviderDetails xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns=""/>
        </getProviderDetails>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The DPP should call this operation first to extract and render information for all available data providers (including contact persons and related email addresses by ID - see below).

To extract details on a specific provider the request simply requires the specification of the unique providerID assigned in the initial doNewUpdateProviderDetails operation:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <getProviderDetails xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <ProviderID xmlns="http://ejb.revitalization.services.ndg/">110</ProviderID>
            </request>
        </getProviderDetails>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The return has the same structure as the return in the doNewUpdateProviderDetails operation:

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:getProviderDetailsResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:Provider>
                    <ns1:OAIProvider>
                        <ns1:splitBySet>true</ns1:splitBySet>
                        <ns1:Format>dif</ns1:Format>
                    </ns1:OAIProvider>
                    <ns1:ProviderCommon>
                        <ns1:ProviderURL>http://badc.nerc.ac.uk/badc_oai/provider</ns1:ProviderURL>
                        <ns1:ProviderName>badc</ns1:ProviderName>
                    </ns1:ProviderCommon>
                    <ns1:ProviderID>110</ns1:ProviderID>
                </ns1:Provider>
            </return>
        </ns1:getProviderDetailsResponse>
    </env:Body>
</env:Envelope>

doHarvest Operation

The doHarvest operation will simply initiate a metadata harvest for a provider using the unique provider ID assigned to that provider at the time of provider details entry into the DPWS database. The doHarvestRequest requires the ProviderID and optionally any number of !EmailReportID from the provider email details. Note that the !EmailReportID element must contain the associated unique ID for the recipient in the relevant Provider details. In the sample below, 162 is the ID assigned to Ms Upsy Daisy. If Monitor is set to true then that email will also be notified even if no !EmailReportID element is supplied.

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <doHarvest xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <EmailReportID xmlns="http://ejb.revitalization.services.ndg/">162</EmailReportID>
                <ProviderID xmlns="http://ejb.revitalization.services.ndg/">101</ProviderID>
            </request>
        </doHarvest>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The doHarvestResponse will return a status value and message as well as a unique process identifier in processID. This assigned identifier is the value that must be used in subsequent getStatusProcess and doIngest operations.

In the response note that a unique process id is returned as an attribute "id" for processID. This is the value that must be used in subsequent getStatusProcess and doIngest operations.

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:doHarvestResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:processID ns1:id="430"/>
            </return>
        </ns1:doHarvestResponse>
    </env:Body>
</env:Envelope>

doIngest Operation

A metadata ingest into the Discovery database for a particular provider is acheived by simply supplying the !ProcessID returned in the DoHarvestResponse resulting from a previous DoHarvestRequest for the specified provider. Like the doHarvestRequest an !EmailReportID can be supplied to which process completion reporting messages can be sent if the user wishes. The doIngest operation must only be called on a processID once the getStatusProcess operation has verifed that the harvest has completed.

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <doIngest xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <ProcessID xmlns="http://ejb.revitalization.services.ndg/"
                    xmlns:ns1="http://ejb.revitalization.services.ndg/" ns1:id="430"/>
                <EmailReportID xmlns="http://ejb.revitalization.services.ndg/">196</EmailReportID>
            </request>
        </doIngest>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

Note that the processID to ingest is supplied as the value for attribute "id" in !ProcessID.

The response will be a simple confirmation that the ingest request has been recieved.

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:doIngestResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
            </return>
        </ns1:doIngestResponse>
    </env:Body>
</env:Envelope>

getStatusProcess Operation

The system of metadata harvesting and ingestion can take long periods to complete, especially when large numbers of records are involved and there is heavy traffic on the hardware systems employed at CEDA. As the metadata harvest needs to fully complete before ingestion takes place the getStatusProcess operations allow the client to determine when it is possible to continue with further operations on a specific processID. This includes getHarvestHistory and getIngestHistory operations.

The getStatusProcessRequest requires simple the processID as input and the getStatusProcessResponse will return

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <getStatusProcess xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <ProcessID xmlns="http://ejb.revitalization.services.ndg/">430</ProcessID>
            </request>
        </getStatusProcess>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:getStatusProcessResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:ProcessID ns1:status="end_ingest" ns1:id="430"/>
            </return>
        </ns1:getStatusProcessResponse>
    </env:Body>
</env:Envelope>

There are 3 values available for the status of the doHarvest operation: start_harvest; run_harvest; end_harvest. Likewise, the are 3 status values for the doIngest operation: start_ingest; run_ingest; end_ingest. These values are supplied in the status attribute of the ProcessID element in the getStatusProcessResponse. The doIngest operation must only be called once the doHarvest status is "end_harvest".

In order to monitor the current status of a certain operation, the DPP will need to recursively perform a GetStatusProcesses? operation until the DPWS returns an "end" status. At this stage the DPP can issue a getHarvestHistoryRequest or getIngestHistoryRequest.

getHarvestHistory Operation

The DPP should perform the GetHarvestHistory? operation once a getStatusProcess has affirmed the processID completion. The DPP should use this operation to either download and synchronise the local sqlLite db contents or render the information directly on the DPP front end. When this operation is called with a providerID it will return just the history for that provider. If no providerID is supplied in the request the response will return the harvest history for all providers. A date range is optional - if this is not supplied then all harvest history records are returned.

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <getHarvestHistory xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <ProviderID xmlns="http://ejb.revitalization.services.ndg/">101</ProviderID>
            </request>
        </getHarvestHistory>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:getHarvestHistoryResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:HarvestingEvent>
                    <ns1:RequestId>400</ns1:RequestId>
                    <ns1:TotalRecords>157</ns1:TotalRecords>
                    <ns1:HarvestStartTime>2010-08-24T10:15:43.393+01:00</ns1:HarvestStartTime>
                    <ns1:HarvestStopTime>2010-08-24T10:15:56.883+01:00</ns1:HarvestStopTime>
                    <ns1:ProviderID>101</ns1:ProviderID>
                </ns1:HarvestingEvent>
                <ns1:HarvestingEvent>
                    <ns1:RequestId>410</ns1:RequestId>
                    <ns1:TotalRecords>157</ns1:TotalRecords>
                    <ns1:HarvestStartTime>2010-08-24T11:47:53.009+01:00</ns1:HarvestStartTime>
                    <ns1:HarvestStopTime>2010-08-24T11:47:57.693+01:00</ns1:HarvestStopTime>
                    <ns1:ProviderID>101</ns1:ProviderID>
                </ns1:HarvestingEvent>
                <ns1:HarvestingEvent>
                    <ns1:RequestId>420</ns1:RequestId>
                    <ns1:TotalRecords>157</ns1:TotalRecords>
                    <ns1:HarvestStartTime>2010-08-24T11:51:32.199+01:00</ns1:HarvestStartTime>
                    <ns1:HarvestStopTime>2010-08-24T11:51:40.124+01:00</ns1:HarvestStopTime>
                    <ns1:ProviderID>101</ns1:ProviderID>
                </ns1:HarvestingEvent>
                <ns1:HarvestingEvent>
                    <ns1:RequestId>421</ns1:RequestId>
                    <ns1:TotalRecords>157</ns1:TotalRecords>
                    <ns1:HarvestStartTime>2010-08-24T11:54:25.699+01:00</ns1:HarvestStartTime>
                    <ns1:HarvestStopTime>2010-08-24T11:54:38.633+01:00</ns1:HarvestStopTime>
                    <ns1:ProviderID>101</ns1:ProviderID>
                </ns1:HarvestingEvent>
                <ns1:HarvestingEvent>
                    <ns1:RequestId>430</ns1:RequestId>
                    <ns1:TotalRecords>157</ns1:TotalRecords>
                    <ns1:HarvestStartTime>2010-08-24T16:49:18.444+01:00</ns1:HarvestStartTime>
                    <ns1:HarvestStopTime>2010-08-24T16:49:28.421+01:00</ns1:HarvestStopTime>
                    <ns1:ProviderID>101</ns1:ProviderID>
                </ns1:HarvestingEvent>
            </return>
        </ns1:getHarvestHistoryResponse>
    </env:Body>
</env:Envelope>

getIngestHistory Operation

The DPP should perform the getIngestHistory operation once a getStatusProcess has affirmed the processID completion. The DPP should use this operation to either download and synchronise the local sqlLite db contents or render the information directly on the DPP front end. This operations works in a similar manner to the getHarvestHistory operation- just supply the providerID and a data range if required. If not data range is supplied the entire ingest history for that provider is returned. If no providerID is supplied the entire ingest history for all providers is returned.

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <getIngestHistory xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">                
                <ProviderID xmlns="http://ejb.revitalization.services.ndg/">101</ProviderID>
            </request>
        </getIngestHistory>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

TBC !  (returns an ok, but need to fix a small bug!

getProviderStatistic Operation

Each ingested records has a reference to the provider from which has been harvested. This operation returns the total number of records ingested per provider. The ProviderID element have to be the number of a provider, otherwise empty request will return the statistics for all the providers.

deleteProvider Operation

Deletes one provider. The ProviderID element have to be the number of a provider.

addTimer Operation

Creates a periodic harvest/ingest process for the specified provider. Each provider cannot be associated to more than one timer. Is possible to add a timer even updating/creating a provider using the  DoNewUpdateProvider operation.

deleteTimer Operation

Deletes a previously created timer.

List Values

harvest-provider-type

This list describes the types of harvesting available. Currently restricted to 2 values

  • OAI: The provider publishes records via OAI-PMH v2 protocol
  • CSW: The provider publishes using the OGC CSW specification. The DPWS will use the harvest operation to iteratively retrieve records.

metadata-format

This list describes the format values used on various current providers OAI sites. New providers or updates should use either DIF_9.4, MEDIN_2.3 or NERC_1.0. As the NERC 1.0 ISO profile is still under discussion this is not currently implemented (TBC). When using the DPP/DPWS the provider should choose a value that matches one of the values below. If the value is not present the provider should either change the value on their OAI site or request CEDA to add a value to this list.

  • DIF_9.4
  • ANY
  • dif
  • oai_dc
  • MEDIN_2.3

harvest-operation-type

This lists the type of OAI harvest to undertake - if ALL is specified, ALL records will be retrieved. If NEW is specified, only new records on the providers OAI will be retrieved.

  • ALL
  • NEW

Attachments