wiki:DiscoveryComponents

Version 23 (modified by sdonegan, 9 years ago) (diff)

--

Description of Discovery Components

Introduction

This guide aims to describe in simple terms the components of the discovery service.

Last updated 2010/09/20 by Matt Pritchard

Overview

The following diagram shows the components of the discovery service:

Updated to show DPWS & CSW Harvest

Data Providers create metadata documents describing data resources. These are published by each data provider to make them available for others to access. An automatic process gathers or harvests these documents from each data provider, and ingests them into a database where they are stored alongside those from other data providers. Data providers have control over their publishing tool via the Data Providers Admin Interface. A web service carries out searches of this database in response to search requests received from a search interface, possibly hosted by a third party as part of a web portal. The web service returns results back to the search interface, for presentation by the search inteface to display to the user. Search tools included in the search interface help the user construct search requests based on time periods, geographic areas and text terms from controlled vocabularies, provided by a vocab server.

Definitions

Data Provider

Organisation (e.g. NERC data centre) that produces metadata records and publishes them via OAI.

Data Resources

Things described by metadata records

Publishing

The act of putting metadata records in a system that exposes them for external access over the internet. This is done using OAI, a software toolkit installed at each Data provider site. A data provider would have the "OAI Provider" function of this software installed, which simply exposes a collection of metadata records in a standard way, ready for harvesting. Each data provider is in control of his/her OAI Provider software and should register the details of their "node" using the  Data Providers Web Service.

Data Providers Admin Interface

A web-based tool for Data providers to enter the details (URL plus some other configuration options) of their OAI Provider or CSW publishing node, so that the automated harvesting process knows where to go to harvest metadata records. Uses the #DataProvidersAdminInterface? as the backend.

Data Providers Web Service

A piece of software offering a "presentation-less" set of functions enablng the administration of data provider nodes. Typically, calls to this service would be made via the  Data Providers Web Service

Harvesting

A process by which metadata records are collected. Can be done wither by OAI-PMH : Open Archives Initiative Protocol for Metadata Harvesting), or via the CSW harvesting method. This function is performed centrally to gather metadata from all participating data providers.

Discovery Database / Ingest

Harvested metadata records are processed centrally and ingested (inserted) into the discovery index database, which stores the documents in their entirety (to enable full-text searching), but also pulls out pre-defined fields within them to enable specific types of searhes (e.g. spatial extent, time periods). The database is held as a set of relational database tables within a database server, but with original documents preserved in their native format (XML).

Discovery Web Service

A piece of software run alongside the discovery database, which offers a "presentation-less" service to handle requests for the execution of a search, or for the retrieval of a document. On receipt of an appropriately-constructed (as defined by the WSDL) XML message from the search interface, it will carry out a search of the discovery database and construct a search response message, which is sent back to the search interface. Similarly, it also handles requests for returning specific documents in the database : a request is received as an XML message (via SOAP), and an appropriate response is sent back to the sender. In all cases, messages (request/response) are exchanged in XML via SOAP, with all presentational formatting handled by the search interface (which itself may be part of a 3rd party web portal).

WSDL

Web Service Description Language : typically there will be a document describing in machine-readable form the operations that a service is capable of performing, and the required XML structure of the request and response messages for each of these operations.

Search Interface

A web application consisting of tools to enable a user to define a search to be sent to the discovery web service. At its simplest, this could be a simple text box and submit button, but may have more sophisticated tools such as click-and-drag map tools to define a region of interest, calendar tools to define dates/times, and in some cases may include tools to select terms from controlled vocabularies. Lists of these terms may be populated by calls (similar to the search request/response messages) to a vocab server.

Web Portal

A web site consisting of several applications, one of which may be a search interface.

Vocab Server

A presentation-less web service (similar in nature to the discovery web service) that can receive requests for listing the contents of particular controlled vocabularies.

Controlled Vocabulary

Community-maintained list of standard terms (and their definitions), for use within particular scientific domains, so that users are able to point at a particular term for a non-ambiguous definition of a concept. Exposed (made available) by a vocab server

Attachments