T03_DataExtractor: index.html

File index.html, 7.8 KB (added by astephen, 15 years ago)

Data Extractor Overview

1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
3<META http-equiv=CONTENT-TYPE content="text/html; charset=utf-8">
4<META content="MSHTML 6.00.2900.2802" name=GENERATOR>
5<META content=20060317;12440700 name=CREATED>
6<META content=20060317;12445300 name=CHANGED>
7<META content=FrontPage.Editor.Document name=ProgId></HEAD>
8<BODY lang=en-US dir=ltr>
9<H1>An overview of the Data Extractor (DX)</H1>
11<P>The Data Extractor (DX) is a python-based tool for allowing users to access
12subsets of large geospatial datasets via a common interface. This is typically
13the DX Browser Client which is accessible as a set of web pages. However, users
14can also interact programmatically with the DX-Server which presents a
15functional interface as a Web Service. This document provides an overview of the
16key components of the DX. More detail is, or will soon be, available in the
17following guides:</P>
19  <LI>
20  <P style="MARGIN-BOTTOM: 0cm"><B>DX Installation Guide</B> </P>
21  <LI>
22  <P style="MARGIN-BOTTOM: 0cm"><B>DX Data Ingestion Guide</B> </P>
23  <LI>
24  <P style="MARGIN-BOTTOM: 0cm"><B>DX Administrator's Guide</B> </P>
25  <LI>
26  <P style="MARGIN-BOTTOM: 0cm"><B>DX User Guide</B> </P>
27  <LI>
28  <P><B>Guide to Securing the DX</B> </P></LI></UL>
30<P>The following diagram provides an overview of the DX architecture
31highlighting the main components in terms of managing and interacting with the
33<P><IMG height=383 src="" width=767 align=bottom border=0 name=Graphic1></P>
34<P>Each component is described in more detail below.</P>
36<P>This is the part of the system that does the core processing such as file
37I/O, subsetting and writing of data files. It provides:</P>
39  <LI>
40  <P style="MARGIN-BOTTOM: 0cm">a functional interface that can be interrogated
41  by the client (represented by the <B>Web Service interface</B> in the above
42  diagram) applications. </P>
43  <LI>
44  <P style="MARGIN-BOTTOM: 0cm">a metadata store describing datasets located in
45  a local archive. </P>
46  <LI>
47  <P>an I/O layer that extracts requested data (and metadata). </P></LI></UL>
48<P>The DX-Server is controlled by the <B>Administrator</B>.</P>
49<P>Installation requires knowledge of the local file system and access to
50various locations such as the webserver CGI area. The <B>Server
51Configuration</B> module (typically called <I>serverConfig.py</I>) is used to
52set up the correct paths to local resources which can then be accessed by the
53DX-Server. These issues are dealt with further in the DX Installation and
54Administrator Guides.</P>
55<P>Both the DX-Server and the DX-Clients are python packages (i.e. collections
56of python modules). The DX is written using Object Oriented Programming in order
57to make the code straightforward and simple for the developer to build upon and
58modify where required. The DX-Server builds upon the Climate Data Analysis Tools
59(CDAT) package which provides the underlying I/O, selection and subsetting
60tools. CDAT is not distributed with the DX.</P>
61<H3>DX-Client (Browser)</H3>
62<P>The DX Browser Client is the main method via which users will access the DX.
63If provides a CGI front-end that a user can access via any standard web-browser.
64In a secure configuration users must log-in to the DX client but you can also
65configure the DX to provide open access where users can see all datasets. Access
66can be limited by user and/or by roles associated with datasets.</P>
67<P>The Administrator will install the DX-client which may exist on the same
68machine as the DX-Server or remotely. The client and server communicate using
69SOAP (Simple Open Access Protocol) messages which require the installation of
70the python ZSI library (not supplied with the DX).</P>
71<P>The <B>Client Configuration</B> module (normally called
72<I>clientConfig.py</I>) is controlled by the Administrator who configures the
73client for the local system.</P>
74<H3>DX-Client (Command Line)</H3>
75<P>The command line client for the DX allows users to interact programmatically
76with the DX-Server. This is a relatively untested feature but has the potential
77to allow users to embed calls to the DX-Server in their programmes and scripts
78so that data can be extracted seamlessly as and when the user needs it.</P>
80<P>The data archive must currently sit on the same network as the DX-Server and
81be visible via local path names. The archive must contain data held in files
82formatted as NetCDF and GRIB. There is also some support available in
83non-standard versions for pp-format (UK Met Office). </P>
84<P>The metadata inside the files should adhere (to some degree) to the
85CF-Metadata Convention for NetCDF although some variation will normally work.
86Such data will be easy to ingest without manual intervention.</P>
87<H3>Dataset Metadata</H3>
88<P>The DX understands the concept of a "Dataset" as a collection of one or more
89data files containing variables with a repeated structure. Typically these are
902D or 3D model fields with one time step per file.</P>
91<P>The DX also has the concept of a "Dataset Group". This is a logical
92collection of "Datasets". For example:</P>
93<TABLE cellSpacing=2 cellPadding=2 width="100%" border=1>
94  <TBODY>
95  <TR>
96    <TD width="33%">
97      <P><B>Dataset Group</B></P></TD>
98    <TD width="33%">
99      <P>VFGS Model Output</P></TD>
100    <TD width="34%">
101      <P>VFGS Model Output</P></TD></TR>
102  <TR>
103    <TD width="33%">
104      <P><B>Datasets</B></P></TD>
105    <TD width="33%">
106      <P>VFGS Ocean Model Output</P></TD>
107    <TD width="34%">
108      <P>VFGS Atmospheric Model Output</P></TD></TR>
109  <TR>
110    <TD width="33%">
111      <P><B>Variables</B></P></TD>
112    <TD width="33%">
113      <P>Salinity, SST...</P></TD>
114    <TD width="34%">
115      <P>u-wind, v-wind...</P></TD></TR></TBODY></TABLE>
116<P>By default the DX requires the Administrator to ingest new Datasets into the
117DX-Server before they can be accessed by users. The Administrator can also
118create new Dataset Groups to put Datasets under.</P>
119<P>When interacting with the DX (via the Browser Client or Command Line Client)
120the user will select make selections in the following order:</P>
122  <LI>
123  <P style="MARGIN-BOTTOM: 0cm">Dataset Group </P>
124  <LI>
125  <P style="MARGIN-BOTTOM: 0cm">Dataset </P>
126  <LI>
127  <P style="MARGIN-BOTTOM: 0cm">Variable </P>
128  <LI>
129  <P style="MARGIN-BOTTOM: 0cm">Spatial (Horizontal and Vertical) axes </P>
130  <LI>
131  <P style="MARGIN-BOTTOM: 0cm">Temporal axes </P>
132  <LI>
133  <P>Output file format </P></LI></OL>
134<P>If the user selects 2 variables the DX will try and subtract variable 2 from
135variable 1 by interpolating variable 2 to the grid of variable 1.</P>
136<P>The Dataset Metadata is stored in an XML file (normally called
137<I>inputDatasets.xml</I>). Ingestion of datasets is describe in detail in the DX
138Ingestion Guide.</P>
139<H3>Web Service Interface</H3>
140<P>The Web Service Interface to the DX-Server is a python script with a number
141of functions that are presented as a Web Service when the script is run. This
142server script then waits for calls from client applications. Clients can only
143access the DX-Server when this script is running on the DX-Server machine.</P>
145<P>The DX can be secured or run in non-secure mode. This is all controlled in
146the <B>Server </B>and <B>Client Configuration </B>modules. The DX provides a set
147of programmatic hooks that an Administrator can plug into her local security
148system. The DX allows secure tokens to be exchanged between client and server so
149these can be modified to provide an interface to the local security
150implementation in your system.</P>
151<P>More detailed are provided in the <B>Guide to Securing the DX</B>.</P>