wiki:DiscoveryComponents

Version 12 (modified by mpritcha, 9 years ago) (diff)

--

Description of Discovery Components

Introduction

This guide aims to describe in simple terms the components of the discovery service.

Last updated 2010/04/29 by Matt Pritchard

Overview

The following diagram shows the components of the discovery service:

Data Providers create metadata documents describing data resources. These are published by each data provider to make them available for others to access. An automatic process gathers or "harvests" these documents from each data provider, and ingests them into a database where they are stored alongside those from other data providers. Data providers have control over their publishing tool via an admin interface. A web service carries out searches of this database in response to search requests received from a search interface, possibly hosted by a third party as part of a web portal. The web service returns results back to the search interface, for presentation by the search inteface to display to the user. Search tools included in the search interface help the user construct search requests based on time periods, geographic areas and text terms from standard vocabularies, provided by a vocab server.

Definitions

Data Provider

Organisation (e.g. NERC data centre) that produces metadata records and publishes them via OAI.

Data resources
Things described by metadata records
Publishing
The act of putting metadata records in a system that exposes them for external access over the internet. This is done using OAI, a software toolkit installed at each Data provider site. A data provider would have the "OAI Provider" function of this software installed, which simply exposes a collection of metadata records in a standard way, ready for harvesting. Each data provider is in control of his/her OAI Provider software and should register the details of their "node" using the "OAI Admin interface".
OAI Admin Interface
A web-based tool for Data providers to enter the details (URL plus some other configuration options) of their OAI Provider "node", so that the automated harvesting process knows where to go to harvest metadata records.
OAI Harvesting
A process by which metadata records are collected (via OAI-PMH : Open Archives Initiative Protocol for Metadata Harvesting) centrally from all participating data providers.
Discovery Database / Ingest
Harvested metadata records are processed centrally and ingested (inserted) into the discovery index database, which stores the documents in their entirety (to enable full-text searching), but also pulls out pre-defined fields within them to enable specific types of searhes (e.g. spatial extent, time periods). The database is held as a set of relational database tables within a database server, but with original documents preserved in their native format (XML).
Discovery Web service
A piece of software run alongside the discovery database, which offers a "presentation-less" service to handle search requests. On receipt of an appropriately-constructed XML message from the search interface, it will carry out a search of the discovery database and construct a search response message, which is sent back to the search interface. Similarly, it also handles requests for returning specific documents in the database : a request is received in XML, and an appropriate response is sent back to the sender. In all cases, messages (request/response) are exchanged in XML, with all presentational formatting handled by the search interface (which itself may be part of a 3rd party web portal).
Search Interface
A web application consisting of tools to enable a user to define a search to be sent to the discovery web service. At its simplest, this could be a simple text box and submit button, but may have more sophisticated tools such as click-and-drag map tools to define a region of interest, calendar tools to define dates/times, and in some cases may include tools to select terms from controlled vocabularies. Lists of these terms may be populated by calls (similar to the search request/response messages) to a vocab server.
Web portal
A web site consisting of several applications, one of which may be a search interface.
Vocab server
A presentation-less web service (similar in nature to the discovery web service) that can receive requests for listing the contents of particular controlled vocabularies.
Controlled vocabularies
Community-maintained lists of standard terms (and their definitions), for use within particular scientific domains, so that users are able to point at a particular term for a non-ambiguous definition of a concept.

Attachments