wiki:DPWS_API_NOTES

Version 9 (modified by sdonegan, 9 years ago) (diff)

--

NERC Discovery Providers Web Service : NERC Revitilisation Improvements August 2010

Introduction

As part of the SIB Discovery Service revitilisation project it was decided that a more effective data providers portal (DPP) was required. The new DPP area should inherit the same 'look and feel' of the main new discovery service (updated based on MEDIN portal code and run at BODC). As part of this project, the providers area was to be split into 2 sections:

  • Data Providers Portal (DPP): part of the Discovery Service code stack and using the same templates.
  • Data Providers Web Service (DPWS): A SOAP based web service that offers a number of operations to edit user details, initiate metadata harvesting and metadata ingestion into the main discovery database.

The DPP would be run at BODC and would act as a client to the DPWS hosted at CEDA. This wiki page describes the operations and usage of the DPWS.

DPWS location

The current development version of the DPWS is running at  http://neptune.badc.rl.ac.uk:8180/discovery/dpws?wsdl (as of 23/08/2010)

DPWS operations

The DPWS implements 9 operations:

  • GetListNames: Get names of controlled values used within the service
  • GetList: Get values within named controlled list
  • DoHarvest: Initiate a metadata harvesting operation on a named provider
  • DoIngest: Initiate a metadata ingestion operation into the discovery database for the named provider
  • GetHarvestHistory: Get information on completed metadata harvests for the named provider
  • GetProcessStatus: Get current process status (started/running/finished) for harvest or ingest processes by unique id.
  • GetIngestHistory: Get information on completed metadata ingests for the named provider
  • GetProviderDetails: Get information held on the configuration for an existing provider
  • DoNewUpdateProvider: Create or update information held on an existing provider

These operation request and response types are summarised below

GetListNames Operation

The DPWS relies on several lists of valid terms which are specific to the functionality of this service. The reason for using these 2 "helper" operations rather than encoding these valid terms as <xs:enumeration> in the schema part of the WSDL, is so that future modifications to the service need not necessarily require the modification of the WSDL (which can be incovenient for clients already developed around a particular release of the WSDL). The GetListNames operation simply returns the names of these lists, which can then be used in a subsequent call to the GetList operation.

The WSDL document defines the GetListNamesRequest message as an empty <GetListNames> element, so the request message should look like this (omitting the SOAP Envelope & Body parent elements):

<getListNames xmlns="http://ejb.revitalization.services.ndg/"/>

The getListNamesResponse message comprises a <GetListNamesReturn> element, with child elements containing the names of the lists available for inspection:

<ns1:getListNamesResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:ListNames>
                    <ns1:listItem>harvester-provider-type</ns1:listItem>
                    <ns1:listItem>metadata-format</ns1:listItem>
                    <ns1:listItem>harvest-operation-type</ns1:listItem>
                </ns1:ListNames>
            </return>
        </ns1:getListNamesResponse>

GetList operation

The contents of each of the lists named by the getListNames operation are accessible by invoking a call to the getList operation, with the name of the list as the single argument, encoded as a getListRequest message, as defined in the WSDL :

Request:

<m:GetList xmlns:m="http://medin.discovery.services.ndg/schema">
      <m:listName>PresentFormatList</m:listName>
</m:GetList>

Response:

<ns1:getListResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:ListNames>
                    <ns1:listItem>DIF_9.4</ns1:listItem>
                    <ns1:listItem>ANY</ns1:listItem>
                    <ns1:listItem>dif</ns1:listItem>
                    <ns1:listItem>oai_dc</ns1:listItem>
                    <ns1:listItem>MEDIN_2.3</ns1:listItem>
                </ns1:ListNames>
            </return>
        </ns1:getListResponse>

DoNewUpdateProvider Operation

This operation will update all information held for that particular provider ID with all information held in the request elements. The operation will return a confirmation status and message. Either the CSWProvider or OAIProvider elements must be provided depending on the type of metadata repository offered by the provider.

Note that in the request do not enter values in ProviderID or in any of the email/ID elements - these are assigned by the DPWS and returned in other operations using the ProviderDetail? type. The ProviderContacts? element is optional - use this for additional email contacts at the data provider. In subsequent operations the email ID may be specified so the DPWS API can send emails to these contacts when harvest or ingest operations have completed. The ProviderAdminEmail? element is mandatory and should be the email address of the main provider contact who has administrative control over this provider entry (and who should be the only provider user allowed to undertake this particular operation). The monitor element should be set to True of the admin contact needs notifying of every operation, False if not.

Example request setting up a new provider:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <doNewUpdateProvider xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <DoNewUpdateProvider xmlns="http://ejb.revitalization.services.ndg/">
                <OAIProvider>
                <splitBySet>false</splitBySet>
                <Format>dif</Format>
                </OAIProvider>
                    <ProviderCommon>
                    <ProviderURL>http://badc.nerc.ac.uk/badc_oai/provider</ProviderURL>
                        <ProviderName>badc</ProviderName>
                        <ProviderContacts>
                            <Email>igglepiggle@stfc.ac.uk</Email>
                            <Name>Dr Iggle Piggle</Name>
                        </ProviderContacts>
                        <ProviderAdminEmail>
                            <EmailContact>
                                <Email>upsy@nightgarden.land</Email>
                                <Name>Ms Upsy Daisy</Name>
                            </EmailContact>
                            <Monitor>True</Monitor>
                        </ProviderAdminEmail>
                    </ProviderCommon>
                </DoNewUpdateProvider>
            </request>
        </doNewUpdateProvider>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The response type is show below

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:doNewUpdateProviderResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:Provider>
                    <ns1:OAIProvider>
                        <ns1:splitBySet>true</ns1:splitBySet>
                        <ns1:Format>dif</ns1:Format>
                    </ns1:OAIProvider>
                    <ns1:ProviderCommon>
                        <ns1:ProviderURL>http://badc.nerc.ac.uk/badc_oai/provider</ns1:ProviderURL>
                        <ns1:ProviderName>badc sambd</ns1:ProviderName>
                        <ns1:ProviderContacts>
                            <ns1:Email>iscoobydoo@scary.net</ns1:Email>
                            <ns1:Name>Scooby Doo</ns1:Name>
                            <ns1:ID>195</ns1:ID>
                        </ns1:ProviderContacts>
                        <ns1:ProviderContacts>
                            <ns1:Email>upsy@nightgarden.land</ns1:Email>
                            <ns1:Name>Ms Upsy Daisy</ns1:Name>
                            <ns1:ID>196</ns1:ID>
                        </ns1:ProviderContacts>
                    </ns1:ProviderCommon>
                    <ns1:ProviderID>110</ns1:ProviderID>
                </ns1:Provider>
            </return>
        </ns1:doNewUpdateProviderResponse>
    </env:Body>
</env:Envelope>

The response should return the provider ID assigned to this provider.

In order to update details on an existing provider this operation should also be used, but the existing ProviderID element must be filled with the assigned ProviderID value. Any values present in the request will be used to update the provider details i.e. :

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <doNewUpdateProvider xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <DoNewUpdateProvider xmlns="http://ejb.revitalization.services.ndg/">
                <OAIProvider>
                <splitBySet>false</splitBySet>
                <Format>dif</Format>
                </OAIProvider>
                    <ProviderCommon>
                    <ProviderURL>http://badc.nerc.ac.uk/badc_oai/provider</ProviderURL>
                        <ProviderName>badc</ProviderName>
                        <ProviderContacts>
                            <Email>iscoobydoo@scary.net</Email>
                            <Name>Scooby Doo</Name>
                        </ProviderContacts>
                        <ProviderAdminEmail>
                            <EmailContact>
                                <Email>upsy@nightgarden.land</Email>
                                <Name>Ms Upsy Daisy</Name>
                            </EmailContact>
                            <Monitor>True</Monitor>
                        </ProviderAdminEmail>
                    </ProviderCommon>
                    <ProviderID>110</ProviderID>
                </DoNewUpdateProvider>
            </request>
        </doNewUpdateProvider>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

GetProviderDetails Operation

The getProviderDetails operation should be used to extract information on existing provider details in the DPWS database. The request simply requires the specification of the unique providerID assigned in the initial doNewUpdateProviderDetails operation:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <getProviderDetails xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <ProviderID xmlns="http://ejb.revitalization.services.ndg/">110</ProviderID>
            </request>
        </getProviderDetails>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The return has the same structure as the return in the doNewUpdateProviderDetails operation:

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:getProviderDetailsResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:Provider>
                    <ns1:OAIProvider>
                        <ns1:splitBySet>true</ns1:splitBySet>
                        <ns1:Format>dif</ns1:Format>
                    </ns1:OAIProvider>
                    <ns1:ProviderCommon>
                        <ns1:ProviderURL>http://badc.nerc.ac.uk/badc_oai/provider</ns1:ProviderURL>
                        <ns1:ProviderName>badc</ns1:ProviderName>
                    </ns1:ProviderCommon>
                    <ns1:ProviderID>110</ns1:ProviderID>
                </ns1:Provider>
            </return>
        </ns1:getProviderDetailsResponse>
    </env:Body>
</env:Envelope>

DoHarvest Operation

The DoHarvest operation will simply initiate a metadata harvest for a provider using the unique provider ID assigned to that provider at the time of provider details entry into the DPWS database. The DoHarvestRequest requires the ProviderID and optionally any number of !EmailReportID from the provider email details. Note that the !EmailReportID element must contain the associated unique ID for the recipient in the relevant Provider details. In the sample below, 162 is the ID assigned to Ms Upsy Daisy. If Monitor is set to true then that email will also be notified even if no EmailReportID element is supplied.

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
    <SOAP-ENV:Header/>
    <SOAP-ENV:Body>
        <doHarvest xmlns="http://ejb.revitalization.services.ndg/">
            <request xmlns="">
                <EmailReportID xmlns="http://ejb.revitalization.services.ndg/">162</EmailReportID>
                <ProviderID xmlns="http://ejb.revitalization.services.ndg/">101</ProviderID>
            </request>
        </doHarvest>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

The DoHarvestResponse will return a status value and message as well as a unique process identifier in processID. This assigned identifier is the value that must be used in subsequent !getProcessStatus and DoIngest operations.

In the response note that a unique process id is returned as an attribute "id" for processID. This is the value that must be used in subsequent getStatusProcess and doIngest operations.

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns1:doHarvestResponse xmlns:ns1="http://ejb.revitalization.services.ndg/">
            <return>
                <ns1:Confirmation>
                    <ns1:Status>OK</ns1:Status>
                </ns1:Confirmation>
                <ns1:processID ns1:id="430"/>
            </return>
        </ns1:doHarvestResponse>
    </env:Body>
</env:Envelope>

DoIngest

A metadata ingest into the Discovery database for a particular provider is acheived by simply supplying the !ProcessID returned in the DoHarvestResponse resulting from a previous DoHarvestRequest for the specified provider. Like the DoHarvestRequest an !EmailReportID can be supplied to which process completion reporting messages can be sent if the user wishes.

GetProcessStatus

The system of metadata harvesting and ingestion can take long periods to complete, especially when large numbers of records are involved and there is heavy traffic on the hardware systems employed at CEDA. As the metadata harvest needs to fully complete before ingestion takes place the GetStatusProcess? operations allow the client to determine when it is possible to continue with further operations on a specific processID. This includes GetHarvestHistory? and GetIngestHistory? operations.

The GetProcessStatusRequest? requires simple the processID as input and the GetProcessStatusResponse? will return

In order to monitor the current status of a certain operation, the DPP will need to recursively perform a GetStatusProcesses? operation until the DPWS returns a status “completed”. At this stage the DPP can issue a GetHarvestHistoryRequest? or GetIngestHistoryRequest?.

GetHarvestHistory

The DPP should perform the GetHarvestHistory? operation once a GetStatus? has affirmed the processID completion. The DPP should use this operation to either download and synchronise the local sqlLite db contents or render the information directly on the DPP front end.

GetIngestHistory

The DPP should perform the GetIngestHistory? operation once a GetStatus? has affirmed the processID completion. The DPP should use this operation to either download and synchronise the local sqlLite db contents or render the information directly on the DPP front end.

Operation sequencing

It is required that these operations be called in a certain sequences to allow full operation and rendering of returned information in the DPP. For example to allow a sequence of metadata harvesting, ingestion and seeing the results the following operations must be used in this order (assuming an existing data provider):

List Values

Attachments