Changes between Initial Version and Version 1 of DataExtractor/Manuals/DXOverview

17/07/06 23:39:38 (15 years ago)



  • DataExtractor/Manuals/DXOverview

    v1 v1  
     3<HTML><HEAD><TITLE>Home Page</TITLE> 
     4<META http-equiv=CONTENT-TYPE content="text/html; charset=utf-8"> 
     5<META content="MSHTML 6.00.2900.2802" name=GENERATOR> 
     6<META content=20060317;12440700 name=CREATED> 
     7<META content=20060317;12445300 name=CHANGED> 
     8<META content=FrontPage.Editor.Document name=ProgId></HEAD> 
     9<BODY lang=en-US dir=ltr> 
     10<H1>An overview of the Data Extractor (DX)</H1> 
     12<P>The Data Extractor (DX) is a python-based tool for allowing users to access  
     13subsets of large geospatial datasets via a common interface. This is typically  
     14the DX Browser Client which is accessible as a set of web pages. However, users  
     15can also interact programmatically with the DX-Server which presents a  
     16functional interface as a Web Service. This document provides an overview of the  
     17key components of the DX. More detail is, or will soon be, available in the  
     18following guides:</P> 
     20  <LI> 
     22  <P style="MARGIN-BOTTOM: 0cm"><B>DX Installation Guide</B> </P> 
     23  <LI> 
     24  <P style="MARGIN-BOTTOM: 0cm"><B>DX Data Ingestion Guide</B> </P> 
     25  <LI> 
     26  <P style="MARGIN-BOTTOM: 0cm"><B>DX Administrator's Guide</B> </P> 
     27  <LI> 
     29  <P style="MARGIN-BOTTOM: 0cm"><B>DX User Guide</B> </P> 
     30  <LI> 
     31  <P><B>Guide to Securing the DX</B> </P></LI></UL> 
     33<P>The following diagram provides an overview of the DX architecture  
     34highlighting the main components in terms of managing and interacting with the  
     36<P><IMG height=383 src="" width=767 align=bottom border=0 name=Graphic1></P> 
     37<P>Each component is described in more detail below.</P> 
     40<P>This is the part of the system that does the core processing such as file  
     41I/O, subsetting and writing of data files. It provides:</P> 
     43  <LI> 
     44  <P style="MARGIN-BOTTOM: 0cm">a functional interface that can be interrogated  
     45  by the client (represented by the <B>Web Service interface</B> in the above  
     46  diagram) applications. </P> 
     47  <LI> 
     48  <P style="MARGIN-BOTTOM: 0cm">a metadata store describing datasets located in  
     49  a local archive. </P> 
     51  <LI> 
     52  <P>an I/O layer that extracts requested data (and metadata). </P></LI></UL> 
     53<P>The DX-Server is controlled by the <B>Administrator</B>.</P> 
     54<P>Installation requires knowledge of the local file system and access to  
     55various locations such as the webserver CGI area. The <B>Server  
     56Configuration</B> module (typically called <I></I>) is used to  
     57set up the correct paths to local resources which can then be accessed by the  
     58DX-Server. These issues are dealt with further in the DX Installation and  
     59Administrator Guides.</P> 
     60<P>Both the DX-Server and the DX-Clients are python packages (i.e. collections  
     61of python modules). The DX is written using Object Oriented Programming in order  
     62to make the code straightforward and simple for the developer to build upon and  
     63modify where required. The DX-Server builds upon the Climate Data Analysis Tools  
     64(CDAT) package which provides the underlying I/O, selection and subsetting  
     65tools. CDAT is not distributed with the DX.</P> 
     67<H3>DX-Client (Browser)</H3> 
     68<P>The DX Browser Client is the main method via which users will access the DX.  
     69If provides a CGI front-end that a user can access via any standard web-browser.  
     70In a secure configuration users must log-in to the DX client but you can also  
     71configure the DX to provide open access where users can see all datasets. Access  
     72can be limited by user and/or by roles associated with datasets.</P> 
     73<P>The Administrator will install the DX-client which may exist on the same  
     74machine as the DX-Server or remotely. The client and server communicate using  
     75SOAP (Simple Open Access Protocol) messages which require the installation of  
     76the python ZSI library (not supplied with the DX).</P> 
     77<P>The <B>Client Configuration</B> module (normally called  
     78<I></I>) is controlled by the Administrator who configures the  
     79client for the local system.</P> 
     80<H3>DX-Client (Command Line)</H3> 
     81<P>The command line client for the DX allows users to interact programmatically  
     82with the DX-Server. This is a relatively untested feature but has the potential  
     83to allow users to embed calls to the DX-Server in their programmes and scripts  
     84so that data can be extracted seamlessly as and when the user needs it.</P> 
     87<P>The data archive must currently sit on the same network as the DX-Server and  
     88be visible via local path names. The archive must contain data held in files  
     89formatted as NetCDF and GRIB. There is also some support available in  
     90non-standard versions for pp-format (UK Met Office). </P> 
     91<P>The metadata inside the files should adhere (to some degree) to the  
     92CF-Metadata Convention for NetCDF although some variation will normally work.  
     93Such data will be easy to ingest without manual intervention.</P> 
     94<H3>Dataset Metadata</H3> 
     95<P>The DX understands the concept of a "Dataset" as a collection of one or more  
     96data files containing variables with a repeated structure. Typically these are  
     972D or 3D model fields with one time step per file.</P> 
     98<P>The DX also has the concept of a "Dataset Group". This is a logical  
     99collection of "Datasets". For example:</P> 
     100<TABLE cellSpacing=2 cellPadding=2 width="100%" border=1> 
     101  <TBODY> 
     102  <TR> 
     104    <TD width="33%"> 
     105      <P><B>Dataset Group</B></P></TD> 
     106    <TD width="33%"> 
     107      <P>VFGS Model Output</P></TD> 
     108    <TD width="34%"> 
     109      <P>VFGS Model Output</P></TD></TR> 
     110  <TR> 
     112    <TD width="33%"> 
     113      <P><B>Datasets</B></P></TD> 
     114    <TD width="33%"> 
     115      <P>VFGS Ocean Model Output</P></TD> 
     116    <TD width="34%"> 
     117      <P>VFGS Atmospheric Model Output</P></TD></TR> 
     118  <TR> 
     120    <TD width="33%"> 
     121      <P><B>Variables</B></P></TD> 
     122    <TD width="33%"> 
     123      <P>Salinity, SST...</P></TD> 
     124    <TD width="34%"> 
     125      <P>u-wind, v-wind...</P></TD></TR></TBODY></TABLE> 
     126<P>By default the DX requires the Administrator to ingest new Datasets into the  
     127DX-Server before they can be accessed by users. The Administrator can also  
     128create new Dataset Groups to put Datasets under.</P> 
     130<P>When interacting with the DX (via the Browser Client or Command Line Client)  
     131the user will select make selections in the following order:</P> 
     133  <LI> 
     134  <P style="MARGIN-BOTTOM: 0cm">Dataset Group </P> 
     135  <LI> 
     136  <P style="MARGIN-BOTTOM: 0cm">Dataset </P> 
     137  <LI> 
     138  <P style="MARGIN-BOTTOM: 0cm">Variable </P> 
     140  <LI> 
     141  <P style="MARGIN-BOTTOM: 0cm">Spatial (Horizontal and Vertical) axes </P> 
     142  <LI> 
     143  <P style="MARGIN-BOTTOM: 0cm">Temporal axes </P> 
     144  <LI> 
     145  <P>Output file format </P></LI></OL> 
     146<P>If the user selects 2 variables the DX will try and subtract variable 2 from  
     147variable 1 by interpolating variable 2 to the grid of variable 1.</P> 
     149<P>The Dataset Metadata is stored in an XML file (normally called  
     150<I>inputDatasets.xml</I>). Ingestion of datasets is describe in detail in the DX  
     153Ingestion Guide.</P> 
     154<H3>Web Service Interface</H3> 
     155<P>The Web Service Interface to the DX-Server is a python script with a number  
     156of functions that are presented as a Web Service when the script is run. This  
     157server script then waits for calls from client applications. Clients can only  
     158access the DX-Server when this script is running on the DX-Server machine.</P> 
     160<P>The DX can be secured or run in non-secure mode. This is all controlled in  
     161the <B>Server </B>and <B>Client Configuration </B>modules. The DX provides a set  
     162of programmatic hooks that an Administrator can plug into her local security  
     163system. The DX allows secure tokens to be exchanged between client and server so  
     164these can be modified to provide an interface to the local security  
     165implementation in your system.</P> 
     166<P>More detailed are provided in the <B>Guide to Securing the DX</B>.</P>