source: TI09-UKCollaboration/trunk/DS_Workshop/doc/csml.rst @ 5119

Subversion URL: http://proj.badc.rl.ac.uk/svn/ndg/TI09-UKCollaboration/trunk/DS_Workshop/doc/csml.rst
Revision 5119, 7.2 KB checked in by spascoe, 12 years ago (diff)

Added links.

The CSML Library

Installing CSML

You can now install the CSML library with easy_install. The dependencies cdat_lite and numpy will be installed automatically:

$ easy_install csml

Using the CSML Scanner to create a CSML document

In the directory /home/spascoe/ds_walkthrough/data/WindFeb09 there is some test data. This data contains a month long timeseries of wind measurements from a stationary observing platform. The data is stored in an ASCII file format called NASAAmes (at BADC), and it is stored in multiple files (one file per day).

Open one of the files and have a look at it:

$ less /home/spascoe/ds_walkthrough/data/WindFeb09/wind-sensors_frongoch_20090208.na

You can see the data contains 4 variables (eastward_wind, northward_wind etc.) measured at a single location.

Each of these variables can be represented as a CSML PointSeries feature. So for example one CSML feature can contain 'eastward_wind' across all the NASAAmes files, creating a timeseries of the entire month's data.

We will use the 'csmlscan' tool to create a CSML document to describe these files, but first you need to install a library that can read NASAAmes files. This library is called NAppy (NASA Ames Processing in Python):

$ easy_install nappy

Now the 'csmlscan' tool can be used to create a CSML document to describe these files:

$ csmlscan -f PointSeries -L '52.42 -4.05' -d /home/spascoe/ds_walkthrough/data/WindFeb09 -o mycsml.xml

What do the command line arguments to csmlscan mean?

-f PointSeries
This flag tells the scanner to create PointSeries features.
-L '52.42 -4.05'

This flag is used to specify a latitude longitude location for the PointSeries. Unfortunately in this case we have to manually supply it as NASAAmes metadata is not consistent between files, so a computer can't read it easily. Look in the NASAAmes file again and you can see:

Location name: Frongoch farm (near Aberystwyth, UK)
Location: 52.42 degrees N, -4.05 degrees E
          base of tower 140 m above mean sea level
-d /home/spascoe/ds_walkthrough/data/WindFeb09
d is for 'directory' where the dataset resides
-o mycsml.xml
o is for 'outputfile' where your CSML document will be created

So you should now have a CSML file, 'mycsml.xml'. Open it in a text editor and have a look over it. The XML should contain 4 PointSeries features for the entire month's data.

Experimenting with the CSML API

CSML has a python based Application Programming Interface (API) to interact with CSML documents. We will explore this interactively in ipython.

Start ipython and import csml:

$ ipython
Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
Type "copyright", "credits" or "license" for more information.
IPython 0.9.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.
>>> import csml

Now you can parse your csml file into a Dataset object and features in that file:

>>> ds = csml.parser.Dataset('mycsml.xml')
>>> ds.listFeatureIDs()

Note

One great feature of IPython is the tab-completion. Press <tab> when ever you aren't sure the name of an object or method. This even works with filenames such as mycsml.xml above.

Select a feature by id and assign it to a varible 'f':

>>> f=ds['Mean_eastward_wind_in_1_minute_sample_period']

you can now look at the data in that feature:

>>> f.getLatLon() #get the lat lon of the station
>>> len(f.getTimes()) #see how many times there are  (len="length")
>>> len(f.getDataValues()) # see how many data values
>>> t=f.getTimes()[0:10]   #Assign the first 10 time values to a variable, t
>>> v=f.getDataValues()[0:5] #Assign the first 10 time values to a variable, v
>>> zip(t,v) #display them together

There is also a built in method getSubsetOfData() which allows you request data within a time range.

e.g. Get all the data for February 5th 2009:

>>> f.getSubsetOfData(('2009-02-05T00:00:00.0', '2009-02-05T23:59:59.59'))

Generating CSML for the test gridded dataset

Some CF-NetCDF has been placed in /home/spascoe/ds_walkthrough/data. These are extracts from two BADC datasets, famous and HadCM3, representing various atmospheric and oceanographic domains.

Variable Domain
air_temperature lat/lon/height (pressure)/time
cloud_area_fraction lat/lon/time
sea_ice_thickness lat/lon/time
sea_water_salinity lat/lon/depth/time

To generate CSML for this data use the csmlscan tool again. This time we will use a configuration file rather than command line arguments. First create a configuration file hadcm3.cfg:

[dataset]
dsID:hadcm3
[features]
type: GridSeries
number: many
[files]
root: /home/spascoe/ds_workshop/data/hadcm3
mapping: onetoone
output: ./csml/hadcm3.xml
printscreen:0
[spatialaxes]
spatialstorage:fileextract
[values]
valuestorage:fileextract
[time]
timedimension: time
timestorage:inline

Create the destination directory csml and run csmlscan:

$ mkdir csml
$ csmlscan -c hadcm3.cfg
...
********************************************************************
CSML file is at: ./csml/hadcm3.xml
********************************************************************

In this case, if you look at the CSML file you will notice that all the data is not stored inline in the CSML, but is kept in the original NetCDF files and referenced via 'NetCDFExtract' elements in the CSML document.

Interacting with GridSeries data via the CSML API

System Message: WARNING/2 (<string>, line 189)

Title underline too short.

Interacting with GridSeries data via the CSML API
----------------------------------------------

As before you can parse the csml file, list the features and select one:

>>> import csml
>>> ds=csml.parser.Dataset('csml/hadcm3.xml')
>>> ds.listFeatureIDs()
>>> f=ds['air_temperature']

This time you can look at the 4D spatio-temporal domain of the grid:

>>> f.getDomain()

And you can subset the GridSeries to a smaller GridSeries feature:

>>> outputdir='./'
>>> outputfile='mygrid.nc'
>>> subsetDictionary = {'latitude': (-10, 10), 'longitude': (55, 65), 'time':('2400-01-00T00:00:00.0', '2410-01-00T00:00:00.0')}
>>> f.subsetToGridSeries(outputdir, outputfile, **subsetDictionary)

As you can see this method writes the new grid out as both a NetCDF file and as a new CSML feature. Come out of ipython and run 'ncdump' to view the headers of your new grid:

ncdump -c mygrid.nc

note, there are other methods for subsetting a GridSeries feature to other feature types such as PointSeries or Profiles.

Exercise

Try adding the famous dataset to your csml store. You will need to create a new configuration file for csmlscan similar to the hadcm3.cfg. The NetCDF data is located at /home/spascoe/ds_workshop/data/famous

Note: See TracBrowser for help on using the repository browser.