wiki:CSMLParserHowTo

Version 2 (modified by domlowe, 13 years ago) (diff)

More edits to csml parser how to

Introduction to the CSML Parser & API

This document covers:

  • Using the online parser.
  • How to install the parser code.
  • How to parse a CSML file.
  • How to query CSML attributes.
    • directly via the parser
    • with the high level api to the parser
  • How to create your own CSML documents using the parser 'in reverse'

The CSML parser is a convential parser in that it can read a CSML (which is encoded as XML) file and determine the structure and properties of the data within. The parser creates Python objects representing the contents of the CSML file. These Python objects can then be interogated either directly, or via a higher level CSML API that provides a more intuitive interface. In addition to the ability to parse CSML you can also use the parser 'in reverse' to create your own CSML documents.

So for each class (type of element) in CSML there is a python class. Each class has 3 methods, init(), fromXML() and toXML(). The hierarchical relationship between CSML schema elements is also represented within the schema class hierarchy. The upshot being that you can convert to and from XML to Python representations of your CSML document without losing any structural information or any content. Rather than go into great detail about how this works (which is the subject of another document (TBA)), here we will concentrate on how to use the parser.

So the root level element of a CSML document is the Dataset element, and there is a python class called Dataset(), which has init, fromXML and toXML methods. The hierarchical nature of the parse means that if you call the fromXML or toXML methods of a class it will automatically call the fromXML or toXML methods of it's child classes and this will recurse through the XML hierarchy.

So... how to actually use the parser.

The online parser

Well first the easy way. Use the online parser. This is handy for testing your CSML documents parse as expected. Note this is not a true CSML validator, but will show you what how the parser "sees" your CSML.

The online parser is simply a web interface to the parser, and allows you to parse a CSML document. You can't do anything with the parsed document, but it is useful as a way of verifying what the parser 'sees' when it parses your CSML document. If you don't have a CSML document, you can download one {HERE}. The parser is located at:  http://proj.badc.rl.ac.uk/cgi-bin/csml/parseTest.py Simply browse to your CSML file and submit your query to see if your file parses.

Parsing in python

Now, for real parsing. If you want to use CSML in your applications or create CSML of your own using the parser, you will need to be able to run the Python parser code. The parser itself doesn't have many dependencies, mainly cElementTree. However if you are building applications that use some features of the parser (eg subsetting and creating a NetCDF file) then there will be more dependencies.

The parser is written in Python, so you will need Python [LINK] installed to use it. When you have python installed, try and run the parser code. Download the parser code from {HERE}. You will need the entire 'parser' directory. Try and run the file test.py by typing:

python test.py

If this doesn't work and you need any extra python components e.g. cElementTree you will have to install them on your system.

Once you have everything installed, if you can run test.py you have already parsed a CSML file. Look at the code in test.py. You can see it is calling several parser methods.

tree = ElementTree(file='example.xml')
dataset = Dataset()

#Calling the fromXML method reads the CSML into memory.
dataset.fromXML(tree.getroot())

#This creates a new CSML document string from the CSML objects in memory.
#Hopefully the CSML output should be the same as the CSML it read in.
csml = dataset.toXML()

#print it and see:
strCSML=parser_extra.PrettyPrint(csml) # this just tidies up the formatting
strCSML=parser_extra.removeInlineNS(strCSML)  #this just tidies up the namespaces
print strCSML

So all we are doing is creating a Dataset parser object and saying 'parse all the XML from tree.getroot()'. i.e. parse the entire CSML document.

So now the object called 'csml' is a representation of the CSML document in memory. You can navigate this document directly by using python attributes e.g.:

#Reading the href attribute of the domainReference for a feature and print it:
print dataset.featureCollection.members[3].profileSeriesDomain.domainReference.times.href

Notice that to get to a feature you have to navigate the featureCollection. Individual CSML features are members of dataset.featureCollection.members[]. Anyway, this is all very longwinded so there is a higher level API that wraps up a lot of this detail and makes interacting with features much simpler.

The CSML API

As we have just seen, the parser itself provides an API of sorts via the object hierarchy. but it is clumsy to navigate. The most common things you will want to do with features have been wrapped up in a set of simple methods.