wiki:CSMLParserHowTo

Version 8 (modified by domlowe, 12 years ago) (diff)

editing parser how to (not finished)

Introduction to the CSML Parser & API, Updated for CSML V2

This document covers:

  • Using the online parser
  • How to install the parser
  • How to parse a CSML file
  • How to query CSML attributes
    • directly via the parser
    • with the high level api to the parser
  • How to create your own CSML documents using the parser 'in reverse'

The CSML parser is a conventional parser in that it can read a CSML file (which is encoded as XML) and determine the structure and properties of the data within. The parser creates Python objects representing the contents of the CSML file. These Python objects can then be interogated either directly, or via a higher level CSML API that provides a more intuitive interface. In addition to the ability to parse CSML you can also use the parser 'in reverse' to create your own CSML documents.

So for each class (type of element) in CSML there is a python class. Each class has 3 methods, init(), fromXML() and toXML(). The hierarchical relationship between CSML schema elements is also represented within the schema class hierarchy. The upshot being that you can convert to and from XML to Python representations of your CSML document without losing any structural information or any content.

So the root level element of a CSML document is the Dataset element, and there is a python class called Dataset(), which has init, fromXML and toXML methods. The hierarchical nature of the parser means that if you call the fromXML or toXML methods of a class it will automatically call the fromXML or toXML methods of all classes below it in the CSML XML hierarchy and this will recurse through the XML hierarchy. So calling the fromXML method of the Dataset class will call the fromXML method of all classes below it in the CSML XML hierarchy, eg the FeatureCollection, every Feature etc.

Anyway rather than go into great detail about how this works (which is the subject of another document (TBA)), here we will concentrate on how to use the parser.

So... how to actually use the parser.

The online parser

Well first the easy way. Use the online parser. This is handy for testing your CSML documents parse as expected. Note this is not a true CSML validator, but will show you how the parser 'sees' your CSML. If the input and output differ, then something has not parsed well. This could be a problem with your CSML document or it could be something that isn't fully implemented in the parser or it could just be a bug. Please let me know.

The online parser is simply a web interface to the parser, and allows you to parse a CSML document. You can't do anything with the parsed document, but it is useful as a way of verifying what the parser 'sees' when it parses your CSML document. If you don't have a CSML document, you can download one {HERE}. The parser is located at:  http://proj.badc.rl.ac.uk/cgi-bin/csml2/parseTest.py Simply browse to your CSML file and submit your query to see if your file parses.

Parsing in python

Now, for real parsing. If you want to use CSML in your applications or create CSML of your own using the parser, you will need to be able to run the Python parser code. The parser itself doesn't have many dependencies, mainly cElementTree. However if you are building applications that use some features of the parser (eg subsetting and creating a NetCDF file) then there will be more dependencies. For now though lets just install the basic parser.

The parser is written in Python, so you will need Python installed to use it. When you have python installed, try and run the parser code. Check out the parser code from subversion. The code you need is here:  http://proj.badc.rl.ac.uk/ndg/browser/TI02-CSML/trunk. You will need the entire csml directory and you will need basictest.py and example.xml from here:  http://proj.badc.rl.ac.uk/ndg/browser/TI02-CSML/trunk/csml/Examples/Parsing

Try and run the file basictest.py by typing:

python basictest.py

If this doesn't work and you need any extra python components e.g. cElementTree you will have to install them on your system. You also need to make sure that the csml package you have just downloaded is in the same directory as basictest or available through your PYTHONPATH.

Once you have everything installed, if you can run basictest.py you have already parsed a CSML file. Look at the code in basictest.py. You can see it is calling several parser methods.

import csml

#A path to a file
f='example.xml'

#Create empty dataset object and parse the file,f into it.
dataset=csml.parser.Dataset(file=f)

#the toXML() method returns an elementtree element instance:
csmldoc = dataset.toXML()
print csmldoc

#And the toPrettyXML() method returns a string, with correct formatting and namespaces.
#Tidy up and print the CSML document:
strCSML=dataset.toPrettyXML()
print strCSML

So now the object called 'dataset' is a representation of the CSML document in memory. You can navigate this document directly by using python attributes e.g.:

#Get the dataset id:
print dataset.id

#Reading the href attribute of the domainReference for a feature and print it:
print dataset.featureCollection.members[3].profileSeriesDomain.domainReference.times.href

Notice that to get to a feature you have to navigate the featureCollection. Individual CSML features are members of dataset.featureCollection.members[]. Anyway, this is all very longwinded so there is a higher level API that wraps up a lot of this detail and makes interacting with features much simpler.

The CSML API

As we have just seen, the parser itself provides an API of sorts via the object hierarchy. but it is clumsy to navigate. The most common things you will want to do with features have been wrapped up in a set of simple methods. Rather than accurately document the methods here (PyDoc does that nicely), this is how to use the methods to perform a subsetting operation on a GridSeriesFeature:

import csml   #This will import csml.API amongst other things which are needed (e.g. data access libraries)

f='coapec.xml' # your CSML file

#Initialise and parse the dataset
csmlds = csml.parser.Dataset()  # Create a new empty csml Dataset object
csmlds.parse(f) # parse the CSML file - this is like calling the fromXML() method of the Dataset

#You can now interrogate the CSML document:

#get list of features in the dataset
flist= csmlds.getFeatureList() 
print '\n Here are all the features in %s:' %f
print flist

#select a feature by name (gml:id)
print '\n Selecting feature with gml:id = %s' %flist[4]
feature=csmlds.getFeature(flist[4])

#These are some attributes, the gml:id and gml:description
print feature.id
print feature.description

#get the domain of the feature
print '\n The feature has domain reference:' 
print feature.getDomainReference()

#get the domain complement of the feature
print '\n The feature has domain complement :' 
#print feature.getDomainComplement()

#get combined domain, this returns the domainReference and the domainComplement
print '\n The feature has domain:' 
#print feature.getDomain()

#get list of allowed subsettings
print '\n the following feature subsetting operations are allowed:'
print feature.getAllowedSubsettings()


#Now we can subset the file based on a selection

#define a selection (you would base this on the values of the domain ref/complement but I have hardcoded it here)
timeSelection=['2794-12-1T0:0:0.0', '2844-12-1T0:0:0.0']  #max and min values (you can also provide a list of specific values)
spatialSubsetDictionary= {}
spatialSubsetDictionary['latitude']=(-30.0,30.0)
spatialSubsetDictionary['longitude']=(90, 120.0)
#If the feature is defined in any other dimension you can add that here too.

#request subsetted data from feature (can set output file paths here)
subsetCSML, subsetNetCDF, arraySize=feature.subsetToGridSeries(timeSelection, csmlpath='my.xml', ncpath='my.nc',**spatialSubsetDictionary)


#Now we have a subsetted CSML document and a NetCDF file that describe/contain your subsetted data.
print subsetCSML #csml document (string)
print subsetNetCDF # netcdf file (file)
print 'arraySize: %s' %arraySize  #this is just useful - how big is the data.

But wait, perhaps this didn't work. If you couldn't perform the subset operation, then it is because you need more things installing... All the parser operations shown up till now have just operated on a CSML document and the CSML python objects in memory. However when you perform a subsetting operation, if the data is stored in real data files, then some i/o operations take place. Typically this means installing the cdms module to read NetCDF, Nappy to read NASAAmes, and potentially other modules to read other file formats. I should probably write more on this, but installation is an area that's going to change radically so I won't for now.

So, to summarise, you can:

  • Parse CSML files using the online parser and visualise the content
  • Parse a file using python and interogate it directly in a fairly longwinded manner.
  • Use the CSML API to parse the file and interogate it using simple methods.
  • Perform operations on the CSML and underlying data using the CSML API.

There is another thing you can do and that is to use the parser's toXML() methods to create your CSML. However that is the subject of a separate how to.