wiki:molesparser

Version 1 (modified by domlowe, 13 years ago) (diff)

Added notes on MOLES parsing (unfinished)

MOLES parsing in python

There is a lightweight parsing tool for MOLES documents  here

This code can be used to create MOLES objects in python and generate a corresponding MOLES XML document, or it can be used to read in a MOLES XML document and create python objects. You can then manipulate these python objects and recreate the XML document.

This might be useful in many circumstances, one such circumstance is when your MOLES creation involves more than one step. E.g. BADC has a  http://proj.badc.rl.ac.uk/ndg/browser/TI02-CSML/trunk/csml2MolesStuff csml2moles tool] under development. This process will create a basic MOLES document, but then we will want to add further elements, or edit elements at a later stage. One way of doing this would be to use this moles parser to read in the unfinished MOLES document, then make the changes we need via python (using some sort of tool probably) and then regenerate the XML from the python objects.

One point worth noting is that this tool is described as 'lightweight' because it doesn't provide any sort of checking or restrictions on the MOLES you create - in that sense it is not as strict as the CSML parser. So you will still need to think about your MOLES!

Now we'll see how to use this tool in python

Importing things

Note you will need cElementTree and ElementTree installed. We will import the MOLES tool as the abbreviated 'MRW'

import cElementTree
import elementtree.ElementTree as etree
import datetime
import molesReadWrite as MRW

How to create objects

First, instantiate a new moles document object (called M in this case):

M=MRW.MolesDoc()

Now you can begin creating MOLES elements. You need to create them 'in reverse', so the lowest level elements are create first:

#create metadata description:
mdID=M.metadataDescriptionID(schemeIdentifier='1',repositoryIdentifier='2', localIdentifier='3')
dgMD=M.dgMetadataDescription(metadataDescriptionID=mdID)
#create metadata record
dgMR=M.dgMetadataRecord(dgMetadataID=dgMID, dgDataEntity=dgDE, dgMetadataDescription=dgMD)
dgMeta=MRW.dgMetadata(dgMetadataRecord=dgMR)

Rather than give lots of examples this is an explanation of how the parser works. MOLES classes exist for elements that have child elements. If the element does not have children, it will not be a class, but will be an attribute of it's parent. So, in the following:

         <dgDataGranule>
            <dataModelID>
               <repositoryIdentifier>badc.nerc.ac.uk</repositoryIdentifier>
               <schemeIdentifier>NDG-A0</schemeIdentifier>
               <localIdentifier>example.xml</localIdentifier>
            </dataModelID>
         </dgDataGranule>

dgDataGranule will be a class as it has a child dataModelID, and dataModelID will be a class because it has 3 children, but the 3 identifiers won't be classes, they will only ever be attributes of the dataModelID.

Each MOLES class (e.g. dgMetadataRecord) is essentially the same apart from it's name. Each class can take any number of keyword arguments and there are some rules about how these are processed.

So the above snippet would be declared like:

!#python
dModel=dataModel(repositoryIdentifier='badc.nerc.ac.uk',schemeIdentifier='NDG-A0', localIdentifier='example.xml')
DG  = M.dgDataGranule(dataModelID=dModel)

There are three things you can pass as a keyword argument. These are:

  • Another MOLES element
  • A string
  • A list

If the value of the keyword argument is another MOLES element then that element becomes a child element (with children of it's own) If the value of the keyword argument is a string, that becomes a single child element. If the value of the keyword argument is a list, then this list can contain multiple strings or multiple MOLES elements.