Changes between Version 1 and Version 2 of Discovery/DiscoveryWebService


Ignore:
Timestamp:
20/07/09 14:33:59 (10 years ago)
Author:
mpritcha
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Discovery/DiscoveryWebService

    v1 v2  
    1 = NERC DataGrid Discovery Web Service = 
     1= NERC !DataGrid Discovery Web Service = 
     2 
     3The NERC !DataGrid (NDG) Discovery Web Service provides a search interface to metadata records harvested from collaborating data providers and is the backend server to which the NERC Data Discovery Service is a client. 
     4 
     5  * [#Introduction Introduction] 
     6  * [#Releases Releases] 
     7  * [#Connectivity Connectivity] 
     8  * [#XMLDataTypes XML Data Types] 
     9  * [#DiscoveryServiceOperations Discovery Service Operations] 
     10  * [#TermLists Term Lists] 
     11  * [#SupportedMetadataFormats Supported Metadata Formats] 
     12 
     13== Introduction == 
     14 
     15The Discovery Web Service is a presentation-free web service which acts as a search engine on top of the NDG Discovery metadata catalogue. This catalogue is dynamically populated by the [#harvesting harvesting] of metadata (by the [http://www.ceda.ac.uk CEDA] group at RAL) from a number of collaborating data providers, who make their metadata available in one of a number of [#SupportedMetadataFormats supported formats]. 
     16 
     17The search capability provided by the service enables full-text and spatio-temporal searches of catalogued metadata records and returns search results a defined XML structure, enabling search clients to be constructed by interested parties for their own purposes. The [http://ndg.nerc.ac.uk/discovery NERC Data Discovery Service] is one such client, as is the [http://www.edp.nerc.ac.uk Environmental Data Portal], among other examples. 
     18 
     19== Releases == 
     20 
     21== Connectivity == 
     22 
     23Consumers may access the discovery service via SOAP. Client implementations should be generated from the WSDL at the following URIs: 
     24 
     25[http://ndg3beta.badc.rl.ac.uk/axis2/services/DiscoveryService?wsdl] (ndg3beta deployment : latest stable development version) 
     26 
     27[http://proglue.badc.rl.ac.uk/axis2/services/DiscoveryService?wsdl] (ndg "live" deployment : production version) 
     28 
     29== XML Data Types == 
     30 
     31The XML documents used as request and response documents for each of the service operations (methods) is defined in the <xsd:schema> section of the WSDL document. The structure of each of these documents is discussed as part of the [#DiscoveryServiceMethods operation/method descriptions] below. 
     32 
     33== Discovery Service Operations == 
     34 
     35The discovery service implements 4 operations, namely: 
     36  * getListNames 
     37  * getList 
     38  * doSearch 
     39  * doPresent 
     40 
     41=== getListNames operation === 
     42 
     43The discovery web service relies on several lists of valid terms which are specific to the functionality of this service. The reason for using these 2 "helper" operations rather than encoding these valid terms as <xs:enumeration> in the schema part of the WSDL, is so that future modifications to the service need not necessarily require the modification of the WSDL (which can be incovenient for clients already developed around a particular release of the WSDL). The getListNames operation simply returns the names of these lists, which can then be used in a subsequent call to the getList operation. 
     44 
     45The WSDL document defines the getListNamesRequest message as an empty <getListNames> element, so the request message should look like this (omitting the SOAP Envelope & Body parent elements): 
     46{{{ 
     47<m:getListNames xmlns:m="urn:DiscoveryServiceAPI"/> 
     48}}} 
     49The getListNamesResponse message comprises a <getListNamesReturn> element, with child elements containing the names of the lists available for inspection: 
     50{{{ 
     51<getListNamesReturn xmlns="urn:DiscoveryServiceAPI"> 
     52        <listNames> 
     53                <listName>presentFormatList</listName> 
     54                <listName>orderByFieldList</listName> 
     55                <listName>scopeList</listName> 
     56                <listName>termTypeList</listName> 
     57                <listName>spatialOperatorList</listName> 
     58        </listNames> 
     59</getListNamesReturn> 
     60}}} 
     61 
     62=== getList operation === 
     63 
     64The contents of each of the lists named by the getListNames operation are accessible by invoking a call to the getList operation, with the name of the list as the single argument, encoded as a getListRequest message, as defined in the WSDL : 
     65 
     66Request: 
     67{{{ 
     68<getList xmlns="urn:DiscoveryServiceAPI"> 
     69    <listName>presentFormatList</listName> 
     70</getList> 
     71}}} 
     72 
     73Response: 
     74{{{ 
     75<getListReturn xmlns="urn:DiscoveryServiceAPI"> 
     76    <list name="presentFormatList"> 
     77        <listMember>original</listMember> 
     78        <listMember>DC</listMember> 
     79        <listMember>DIF</listMember> 
     80        <listMember>MDIP</listMember> 
     81        <listMember>ISO19115</listMember> 
     82     </list> 
     83</getListReturn> 
     84}}} 
     85An explanation of the presentFormatList list is given later, in the context of the doPresent operation.  
     86 
     87=== doSearch operation === 
     88 
     89The doSearch operation performs a search against the NDG discovery database. Queries to this database are formulated from the doSearchRequest message, and forwarded to the database via private methods (i.e. the consumer of the web service is not able directly to interact with the database). 
     90 
     91Although outside the scope of the Discovery web service itself, it is worth explaining the structure of the NDG Discovery database which is searched by the service. This is populated from records harvested via [http://www.openarchives.org/pmh/ OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting)] from collaborating data providers. Records are currently harvested in GCMD DIF format, and are tagged at ingest time with one or more "scope" keywords (listed in the scopeList list available from the getList operation). These enable the search to be restricted to particular communities, namely NERC, NERC_DDC (Designated Data Centres) and MDIP (Marine Data Information Partnership). Limited quality control on ingested records is also applied at ingest time, and it is the responsibility of the data provider to ensure that metadata records are provided to sufficient quality to enable them to be visible in the system. 
     92 
     93The doSearchRequest message is shown in schema form in fig X.  
     94 
     95[[Image(doSearchSchema.png)]] 
     96 
     97=== Choice of search type: <term> and <termType> ===  
     98The only mandatory elements are <term> and <termType>, as used in the example "Quick Start", above. By specifying the <termType>, a choice is made as to which of 3 variants of a full-text search should be invoked. This element should be populated with a valid value from the termTypeList list accessible via the getList operation. At present, these are:  
     99 
     100==== fullText ==== 
     101A full-text search is applied to the whole discovery metadata record  
     102==== author ====  
     103A full-text search is applied only to those sections of the discovery metadata record relating to authorship of the dataset  
     104==== parameter ====  
     105A full-text search is applied only to the parameter listing section of the discovery metadata record. 
     106  
     107<term> should be populated with the search term, which can be a string of one or more words and wildcard characters. The service is currently configured to execute searches by attempting to match XML documents (in the discovery database) where ALL of the components of the search term are matched (as opposed to ANY). In this way, increasingly specific searches can be used to refine the search results. Searches are case-insensitive. Examples of fullText search terms are:  
     108 
     109  temperature:: 
     110    Matches records with the word "temperature" in any node of a document  
     111  sea surface temperature:: 
     112    Matches documents having the words "sea", "surface" AND "temperature" (in any order)  
     113  *neodc*:: 
     114    Matches documents containing the string "neodc", even if embedded within a larger string.  
     115 
     116=== Paging : <start> and <howMany> ===  
     117The optional elements <start> and <howMany> control which records from the result set should be returned (although the total number of hits is always returned as a number to aid with paging in clients). If <start> is omitted, the default value used is 1 (i.e. the first record). If <howMany> is omitted, the default number of records returned is 30. 
     118 
     119=== Ordering: <orderBy> and <orderByDirection> === 
     120Ordering of the result set can be requested by setting <orderBy> to one of the valid values listed in the orderByFieldList accesible via the getList operation. Currently these are: 
     121  textRelevance:: 
     122    Ranking metric based on relevance of match to search term (further info?) 
     123  date::  
     124    Specifically, the start date of the date range given for the temporal coverage of the metadata record  
     125  dataCentre::  
     126    The repository identifier of the curator of the described entity (or most appropriate equivalent field) 
     127  datasetResultsetPopularity:: 
     128    ?? 
     129  proximity:: 
     130    ?? 
     131  proximityNearMiss:: 
     132    ?? 
     133  datasetUpdateOrder:: 
     134    ?? 
     135  datasetOrder:: 
     136    ?? 
     137 
     138In addition, the direction of ordering (ascending or descending) can be specified. If omitted, the default direction is descending. 
     139 
     140=== Scope of search: <scope> === 
     141The optional <scope> element can be used to restrict the search to onr or more of the supported NDG Data Provider Groups, defined in NDG controlled vocabulary http://vocab.ndg.nerc.ac.uk/list/N010/0. Currently supported values from this vocabulary are these are given in the the scopeList list accessible via the getList operation. Currently these are: 
     142 
     143  MDIP:: 
     144    Marine Data Information Partnership (organisation now renamed MEDIN) 
     145  NERC_DDC:: 
     146    NERC Designated Data Centres  
     147  NERC:: 
     148    NERC (General) 
     149  DPPP:: 
     150    Data Portals Project Provider 
     151 
     152If <scope> is omitted, the search is not restricted in this way. 
     153 
     154=== Spatial searching : <spatialOperator> and <boundingBox> === 
     155Full-text, author or parameter searches, as described above, may optionally be combined with a further restriction that the spatial coverage described in the metadata records match, according to the specified <spatialOperator>, the specified spatial <boundingBox>. <spatialOperator> may be populated with any of the values from the spatialOperatorList accessible via the getList operation. Currently, supported values are: 
     156 
     157  overlaps (default):: 
     158 
     159  doesNotOverlap:: 
     160 
     161  within:: 
     162 
     163If <spatialOperator> is omitted, but a valid <boundingBox> is supplied, the default operator applied is overlaps. Values for <limitNorth>, <limitSouth>, <limitEast> and <limitWest> should be given in decimal degrees latitude and longitude. <limitNorth> and <limitSouth> must be in the range -90.0 to +90.0, with <limitNorth> greater than <limitSouth>. <limitWest> and <limitEast> must be in the range -180.0 to 180.0 and <limitEast> should be greater than <limitWest>. Bounding boxes that span the -180 degree meridian, or the poles, are not currently supported. 
     164 
     165Spatial searches (as a further restriction of "term" searches) are currently implemented by obtaining a resultset from the term search, obtaining a result set from the spatial search, then returning the intersection of the two result sets. 
     166 
     167=== Temporal searching : <DateRange> === 
     168Full-text, author or parameter searches my optionally be combined with a further restriction that the temporal coverage ovelaps the specified <DateRange>. Both <DateRangeStart> and <DateRangeEnd> must be specified and must be valid dates of the form YYYY-MM-DD. TODO: it is planned to implement a choice of <temporalOperator> in a similar manner to <spatialOperator>. 
     169 
     170== Search results == 
     171The doSearchResponse message is defined in the WSDL as shown below: 
     172 
     173[[Image(doSearchReturnSchema.png)]] 
     174 
     175The <doSearchReturn> element contains the following top-level elements: 
     176 
     177  status:: 
     178    true if successful AND number of hits > 0, false otherwise (designed so that a client need only proceed to parse the rest of the message if results were successfully returned)  
     179  statusMessage:: 
     180    Textual information regarding success / failure / errors  
     181  resultId:: 
     182    reserved for future use  
     183  hits:: 
     184    TOTAL number of hits returned  
     185  documents:: 
     186    parent element for array of <document> elements containing returned document IDs  
     187 
     188A typical search result was shown in the "Quick Start" section. A result where no hits were returned is shown below 
     189{{{ 
     190<doSearchReturn xmlns="urn:DiscoveryServiceAPI"> 
     191        <status>false</status> 
     192        <statusMessage>Search was successful but generated no results.</statusMessage> 
     193        <resultId>0</resultId> 
     194        <hits>0</hits> 
     195</doSearchReturn> 
     196}}} 
     197 
     198== Term Lists == 
     199 
     200== Supported Metadata Formats ==