source: TI09-UKCollaboration/trunk/DS_Workshop/doc/walkthrough.rst @ 5104

Subversion URL: http://proj.badc.rl.ac.uk/svn/ndg/TI09-UKCollaboration/trunk/DS_Workshop/doc/walkthrough.rst@5104
Revision 5104, 13.1 KB checked in by domlowe, 10 years ago (diff)

Adding CSML API and OWSLib to tutorial

Scientific Data Services Workshop: Day 2

System Message: ERROR/3 (<string>, line 2)

Unknown directive type "highlight".

.. highlight:: ini

This tutorial will show you how to use CSML (Climate Science Modelling Language) tools and then walk you through creating a combined WMS/WCS server for NetCDF data conforming to the CF conventions using the CEDA OWS Framework (COWS).

Setting Up your environment

COWS is written in Python and requires a UNIX-like environment. In order to run the examples in this workshop we have configured a server at RAL that everyone can log onto and run a COWS web server. The server is cirrus.badc.rl.ac.uk. Your username and password have been given to you on a separate sheet.

To log into cirrus use the putty application. !TODO: more details

NOTE:: Although not required you might find it useful to enable the X Windowing system for your putty session. To do this goto the tunneling section of the putty configuration and enable X11 forwarding.

The workshop will guide you through how COWS is installed. The first time you login you should create an isolated Python environment into which you will install the required components. We will use the virtualenv tool for this:

cirrus$ virtualenv .
New python executable in ./bin/python2.5
Also creating executable in ./bin/python
Installing setuptools............done.

Each time you login you need to import the virtualenv settings with the source command. You should also do this now. You can tell when virtualenv is active because your command prompt will be prefixed by the virtualenv directory in parentheses:

cirrus$ source ./bin/activate
(user)cirrus$

NOTE:: Experienced UNIX users might want to put this in your ~/.profile file.

Installing CSML

Most COWS components can be automatically downloaded and installed from the internet with the easy_install command. Components installed with easy_install are called eggs. There are a few eggs that need manual compilation. Therefore, for the purposes of this workshop all eggs have been gathered together in the directory /home/spascoe/ds_workshop/eggs. To tell easy_install to look in this directory create the file .pydistutils.cfg in your home directory:

.pydistutils.cfg:

[easy_install]
find-links = /home/spascoe/ds_workshop/eggs

You can now install CSML along with the improved Python shell ipython. All CSML dependencies will be installed automatically:

$ easy_install csml ipython

Using the CSML Scanner to create a CSML document

In the directory /home/dlowe/data/WindFeb09 there is some test data. This data contains a month long timeseries of wind measurements from a stationary observing platform. The data is stored in an ASCII file format called NASAAmes, and it is stored in multiple files (one file per day).

Open one of the files and have a look at it:

$ less /home/dlowe/data/WindFeb09/wind-sensors_frongoch_20090208.na

You can see the data contains 4 variables (eastward_wind, northward_wind etc.) measured at a single location.

Each of these variables can be represented as a CSML PointSeries feature. So for example one CSML feature can contain 'eastward_wind' across all the NASAAmes files, creating a timeseries of the entire month's data.

We will use the 'csmlscan' tool to create a CSML document to describe these files, but first you need to install a library that can read NASAAmes files. This library is called NAppy (NASA Ames Processing in Python):

$ easy_install nappy

Now the 'csmlscan' tool can be used to create a CSML document to describe these files:

$ csmlscan -f PointSeries -L '52.42 -4.05' -d /home/dlowe/data/WindFeb09 -o mycsml.xml

What do the command line arguments to csmlscan mean?

-f PointSeries This flag tells the scanner to create PointSeries features.

-L '52.42 -4.05'

This flag is used to specify a latitude longitude location for the PointSeries. Unfortunately in this case we have to manually supply it as NASAAmes metadata is not consistent between files, so a computer can't read it easily. Look in the NASAAmes file again and you can see:

Location name: Frongoch farm (near Aberystwyth, UK)
Location: 52.42 degrees N, -4.05 degrees E
          base of tower 140 m above mean sea level

-d /home/dlowe/data/WindFeb09

d is for 'directory' where the dataset resides

-o mycsml.xml

o is for 'outputfile' where your CSML document will be created

So you should now have a CSML file, 'mycsml.xml'. Open it in a text editor and have a look over it. The XML should contain 4 PointSeries features for the entire month's data.

Experimenting with the CSML API

CSML has a python based Application Programming Interface (API) to interact with CSML documents. We will explore this interactively in ipython.

Start ipython:

$ipython

and import csml:

>>> import csml

Now you can parse your csml file into a Dataset object:

>>> ds=csml.parser.Dataset('mycsml.xml')

and you can list the features in that file:

>>> ds.listFeatureIDs()

select a feature by id and assign it to a varible 'f':

>>> f=ds['Mean_eastward_wind_in_1_minute_sample_period']

you can now look at the data in that feature:

>>> f.getLatLon() #get the lat lon of the station
>>> len(f.getTimes()) #see how many times there are  (len="length")
>>> len(f.getDataValues()) # see how many data values
>>> t=f.getTimes()[0:10]   #Assign the first 10 time values to a variable, t
>>> v=f.getDataValues()[0:5] #Assign the first 10 time values to a variable, v
>>> zip(t,v) #display them together

There is also a built in method getSubsetOfData() which allows you request data within a time range.

e.g. Get all the data for February 5th 2009:

>>> f.getSubsetOfData(('2009-02-05T00:00:00.0', '2009-02-05T23:59:59.59'))

Installing COWS ready to setup OGC services

You can install COWS using easy_install as before:

$ easy_install cows

Again this command will install the egg (cows) and all it's dependencies - it will check you have CSML installed - you should already have from earlier, but if not it will install it too.

So we could equally have got to this stage and installed cows, csml & ipython with a single command:

$easy_install cows ipython

Generating CSML for the test gridded dataset

Some CF-NetCDF has been placed in /home/spascoe/ds_walkthrough/data. !TODO: describe datasets. To generate CSML for this data use the csmlscan tool again. This time we will use a configuration file rather than command line arguments. First create a configuration file:

hadcm3.cfg:

[dataset]
dsID:hadcm3
[features]
type: GridSeries
number: many
[files]
root: /home/spascoe/ds_workshop/data/hadcm3
mapping: onetoone
output: ./csml/hadcm3.xml
printscreen:0
[spatialaxes]
spatialstorage:fileextract
[values]
valuestorage:fileextract
[time]
timedimension: time
timestorage:inline

Create the destination directory csml and run csmlscan:

$ mkdir csml
$ csmlscan -c hadcm3.cfg
...
********************************************************************
CSML file is at: ./csml/hadcm3.xml
********************************************************************

In this case, if you look at the CSML file you will notice that all the data is not stored inline in the CSML, but is kept in the original NetCDF files and referenced via 'NetCDFExtract' elements in the CSML document.

Interacting with GridSeries data via the CSML API

System Message: WARNING/2 (<string>, line 225)

Title underline too short.

Interacting with GridSeries data via the CSML API
----------------------------------------------

As before you can parse the csml file, list the features and select one:

>>> import csml
>>> ds=csml.parser.Dataset('csml/hadcm3.xml')
>>> ds.listFeatureIDs()
>>> f=ds['air_temperature']

This time you can look at the 4D spatio-temporal domain of the grid:

>>> f.getDomain()

And you can subset the GridSeries to a smaller GridSeries feature:

>>> outputdir='./'
>>> outputfile='mygrid.nc'
>>> subsetDictionary = {'latitude': (-10, 10), 'longitude': (55, 65), 'time':('2400-01-00T00:00:00.0', '2410-01-00T00:00:00.0')}
>>> f.subsetToGridSeries(outputdir, outputfile, **subsetDictionary)

As you can see this method writes the new grid out as both a NetCDF file and as a new CSML feature. Come out of ipython and run 'ncdump' to view the headers of your new grid:

ncdump -c mygrid.nc

note, there are other methods for subsetting a GridSeries feature to other feature types such as PointSeries or Profiles.

Create CowsServer

COWS is built on the Pylons web framework. Most command-line tasks in Pylons are done through the paster command. When COWS was installed it automatically makes available the cows_server template to paster:

$ paster create --list-templates
Available templates:
  basic_package:   A basic setuptools-enabled package
  cows_server:     A Pylons template to create CSML-enabled COWS server
  paste_deploy:    A web application deployed through paste.deploy
  pylons:          Pylons application template
  pylons_minimal:  Pylons minimal application template

To create your server use the paster create command:

$ paster create -t cows_server CowsServer csmlstore=$HOME/csml
$ cd CowsServer

Your new COWS project will be created in the CowsServer directory. You can now explore the directory structure. !TODO: basic Pylons outline?

Edit CowsServer/development.ini to set the host as 0.0.0.0 and the port. Follow this general template:

development.ini:

# Change the host and port below
[server:main]
use = egg:Paste#http
host = 0.0.0.0
port = <port>
# This section must be added to make the server visible outside RAL
[filter:proxy]
use = egg:PasteDeploy#prefix
prefix = /<username>
# Note the filter-with section must be added
[app:main]
use = egg:CowsServer
full_stack = true
cache_dir = %(here)s/data
filter-with = proxy

Replace <username> and <port> with values from the table below

Username Port
test 5010
nsb 5011
nico 5012
mlcu 5013
njcu 5014
mase 5015
wael 5016
lq 5017
pjk 5018
mggr 5019
petwa 5020
aprc 5021
monz 5022
dlowe 5023

You are now all ready to start your server. run the command:

$ paster serve development.ini

Try visiting http://cirrus.badc.rl.ac.uk/<username>/. Note the trailing slash which is important. You should see the COWS server catalogue and can try out the WMS demo.

Accessing the Web Coverage Service (WCS) with OWSLib

The WCS can be accessed using any standard WCS client. Unfortunately WCS is not as mature a standard as WMS so not many of these have been built yet!

There is however an Open Souce python OGC client library called OWSLib which we will use to communicate with the WCS. Again we need to install this with easy_install:

$easy_install owslib

Start ipython again and make a connection to the WCS:

>>> from owslib.wcs import WebCoverageService
>>> wcs=WebCoverageService('http://cirrus.badc.rl.ac.uk/dlowe/hadcm3/wcs',version='1.0.0')
>>> print 'Accessing WCS version %s at %s'%(wcs.version, wcs.url)

The OWSLib wcs 'identification' object contains general information about the service (some of which is missing in this service!):

>>> wcs.identification.service
>>> wcs.identification.title
>>> wcs.identification.abstract
>>> wcs.identification.keywords
>>> wcs.identification.fees
>>> wcs.identification.accessConstraints

The OWSLib wcs 'provider' oject contains general information about the service provider:

>>> wcs.provider.name
>>> wcs.provider.url
>>> wcs.provider.contact.name
>>> wcs.provider.contact.email
>>> wcs.provider.contact.organization
>>> wcs.provider.contact.address
>>> wcs.provider.contact.city
>>> wcs.provider.contact.region
>>> wcs.provider.contact.postcode
>>> wcs.provider.contact.country

Now take a look at the available coverages:

>>> wcs.wcs.contents

And select a coverage from the list:

>>> cvg=wcs['air_temperature']

Investigate it:

>>> cvg.boundingBoxWGS84
>>> cvg.timepositions
>>> cvg.timelimits
>>> cvg.supportedFormats
>>> cvg.supportedCRS

Now we know a bit about the coverage we can make a request to get a subset of it from the WCS:

>>> response=wcs.getCoverage(identifier='air_temperature',time=['2992-11-16T00:00:00.0'],bbox=(-80,30,50,60), crs='WGS84', format='cf-netcdf')
You can write this response to a file:
>>> f=open('mywcsoutput.nc', 'wb')
>>> f.write(response.read())
>>> f.close()

Come out of ipython and use ncdump again to investigate your WCS response:

ncdump -c mywcsoutput.nc
Note: See TracBrowser for help on using the repository browser.