wiki:CSMLParserHowTo

Version 4 (modified by domlowe, 13 years ago) (diff)

First draft of how to use parser

Introduction to the CSML Parser & API

This document covers:

  • Using the online parser
  • How to install the parser
  • How to parse a CSML file
  • How to query CSML attributes
    • directly via the parser
    • with the high level api to the parser
  • How to create your own CSML documents using the parser 'in reverse'

The CSML parser is a conventional parser in that it can read a CSML file (which is encoded as XML) and determine the structure and properties of the data within. The parser creates Python objects representing the contents of the CSML file. These Python objects can then be interogated either directly, or via a higher level CSML API that provides a more intuitive interface. In addition to the ability to parse CSML you can also use the parser 'in reverse' to create your own CSML documents.

So for each class (type of element) in CSML there is a python class. Each class has 3 methods, init(), fromXML() and toXML(). The hierarchical relationship between CSML schema elements is also represented within the schema class hierarchy. The upshot being that you can convert to and from XML to Python representations of your CSML document without losing any structural information or any content.

So the root level element of a CSML document is the Dataset element, and there is a python class called Dataset(), which has init, fromXML and toXML methods. The hierarchical nature of the parser means that if you call the fromXML or toXML methods of a class it will automatically call the fromXML or toXML methods of all classes below it in the CSML XML hierarchy and this will recurse through the XML hierarchy. So calling the fromXML method of the Dataset class will call the fromXML method of all classes below it in the CSML XML hierarchy, eg the FeatureCollection, every Feature etc.

Anyway rather than go into great detail about how this works (which is the subject of another document (TBA)), here we will concentrate on how to use the parser.

So... how to actually use the parser.

The online parser

Well first the easy way. Use the online parser. This is handy for testing your CSML documents parse as expected. Note this is not a true CSML validator, but will show you how the parser 'sees' your CSML. If the input and output differ, then something has not parsed well. This could be a problem with your CSML document or it could be something that isn't fully implemented in the parser or it could just be a bug. Please let me know.

The online parser is simply a web interface to the parser, and allows you to parse a CSML document. You can't do anything with the parsed document, but it is useful as a way of verifying what the parser 'sees' when it parses your CSML document. If you don't have a CSML document, you can download one {HERE}. The parser is located at:  http://proj.badc.rl.ac.uk/cgi-bin/csml/parseTest.py Simply browse to your CSML file and submit your query to see if your file parses.

Parsing in python

Now, for real parsing. If you want to use CSML in your applications or create CSML of your own using the parser, you will need to be able to run the Python parser code. The parser itself doesn't have many dependencies, mainly cElementTree. However if you are building applications that use some features of the parser (eg subsetting and creating a NetCDF file) then there will be more dependencies. For now though lets just install the basic parser.

The parser is written in Python, so you will need Python [LINK] installed to use it. When you have python installed, try and run the parser code. Download the parser code from {HERE}. You will need the entire 'parser' directory. Try and run the file test.py by typing:

python test.py

If this doesn't work and you need any extra python components e.g. cElementTree you will have to install them on your system.

Once you have everything installed, if you can run test.py you have already parsed a CSML file. Look at the code in test.py. You can see it is calling several parser methods.

tree = ElementTree(file='example.xml')
dataset = Dataset()

#Calling the fromXML method reads the CSML into memory.
dataset.fromXML(tree.getroot())

#This creates a new CSML document string from the CSML objects in memory.
#Hopefully the CSML output should be the same as the CSML it read in.
csml = dataset.toXML()

#print it and see:
strCSML=parser_extra.PrettyPrint(csml) # this just tidies up the formatting
strCSML=parser_extra.removeInlineNS(strCSML)  #this just tidies up the namespaces
print strCSML

So all we are doing is creating a Dataset parser object and saying 'parse all the XML from tree.getroot()'. i.e. parse the entire CSML document.

So now the object called 'csml' is a representation of the CSML document in memory. You can navigate this document directly by using python attributes e.g.:

#Reading the href attribute of the domainReference for a feature and print it:
print dataset.featureCollection.members[3].profileSeriesDomain.domainReference.times.href

Notice that to get to a feature you have to navigate the featureCollection. Individual CSML features are members of dataset.featureCollection.members[]. Anyway, this is all very longwinded so there is a higher level API that wraps up a lot of this detail and makes interacting with features much simpler.

The CSML API

As we have just seen, the parser itself provides an API of sorts via the object hierarchy. but it is clumsy to navigate. The most common things you will want to do with features have been wrapped up in a set of simple methods. Rather than accurately document the methods here (PyDoc does that nicely), this is how to use the methods to perform a subsetting operation on a GridSeriesFeature:

import API   #This is all you need to import, the API module will import the parser as API.Parser

f='coapec.xml' # your CSML file

#Initialise and parse the dataset
csml = API.Parser.Dataset()  # Create a new empty csml Dataset object
csml.parse(f) # parse the CSML file - this is like calling the fromXML() method of the Dataset

#You can now interrogate the CSML document:

#get list of features in the dataset
flist= csml.getFeatureList() 
print '\n Here are all the features in %s:' %f
print flist

#select a feature by name (gml:id)
print '\n Selecting feature with gml:id = %s' %flist[4]
feature=csml.getFeature(flist[4])

#These are some attributes, the gml:id and gml:description
print feature.id
print feature.description

#get the domain of the feature
print '\n The feature has domain reference:' 
print feature.getDomainReference()

#get the domain complement of the feature
print '\n The feature has domain complement :' 
#print feature.getDomainComplement()

#get combined domain, this returns the domainReference and the domainComplement
print '\n The feature has domain:' 
#print feature.getDomain()

#get list of allowed subsettings
print '\n the following feature subsetting operations are allowed:'
print feature.getAllowedSubsettings()


#Now we can subset the file based on a selection

#define a selection (you would base this on the values of the domain ref/complement but I have hardcoded it here)
timeSelection=['2794-12-1T0:0:0.0', '2844-12-1T0:0:0.0']  #max and min values (you can also provide a list of specific values)
spatialSubsetDictionary= {}
spatialSubsetDictionary['latitude']=(-30.0,30.0)
spatialSubsetDictionary['longitude']=(90, 120.0)
#If the feature is defined in any other dimension you can add that here too.

#request subsetted data from feature (can set output file paths here)
subsetCSML, subsetNetCDF, arraySize=feature.subsetToGridSeries(timeSelection, csmlpath='my.xml', ncpath='my.nc',**spatialSubsetDictionary)


#Now we have a subsetted CSML document and a NetCDF file that describe/contain your subsetted data.
print subsetCSML #csml document (string)
print subsetNetCDF # netcdf file (file)
print 'arraySize: %s' %arraySize  #this is just useful - how big is the data.

But wait, perhaps this didn't work. If you couldn't perform the subset operation, then it is because you need more things installing... All the parser operations shown so far have just operated on a CSML document and the CSML python objects in memory. However when you perform a subsetting operation, if the data is stored in real data files, then some i/o operations take place. Typically this means installing the cdms module to read NetCDF, Nappy to read NASAAmes, and potentially other modules to read other file formats. I should probably write more on this, but installation is an area that's going to change radically so I won't for now.

So, to summarise, you can:

  • Parse CSML files using the online parser and visualise the content
  • Parse a file using python and interogate it directly in a fairly longwinded manner.
  • Use the CSML API to parse the file and interogate it using simple methods.
  • Perform operations on the CSML and underlying data using the CSML API.

There is another thing you can do and that is to use the parser's toXML() methods to create your CSML. However that is the subject of a separate how to.