wiki:molesparser

Version 4 (modified by domlowe, 13 years ago) (diff)

[M] added note about not passing strings

MOLES parsing in python

There is a lightweight parsing tool for MOLES documents  here

This code can be used to create MOLES objects in python and generate a corresponding MOLES XML document, or it can be used to read in a MOLES XML document and create python objects. You can then manipulate these python objects and recreate the XML document.

This might be useful in many circumstances, one such circumstance is when your MOLES creation involves more than one step. E.g. BADC has a  http://proj.badc.rl.ac.uk/ndg/browser/TI02-CSML/trunk/csml2MolesStuff csml2moles tool] under development. This process will create a basic MOLES document, but then we will want to add further elements, or edit elements at a later stage. One way of doing this would be to use this moles parser to read in the unfinished MOLES document, then make the changes we need via python (using some sort of tool probably) and then regenerate the XML from the python objects.

One point worth noting is that this tool is described as 'lightweight' because it doesn't provide any sort of checking or restrictions on the MOLES you create - in that sense it is not as strict as the CSML parser. So you will still need to think about your MOLES!

Now we'll see how to use this tool in python

Importing things

Note you will need cElementTree and ElementTree installed. We will import the MOLES tool as the abbreviated 'MRW'

import cElementTree
import elementtree.ElementTree as etree
import datetime
import molesReadWrite as MRW

How to create objects

First, instantiate a new moles document object (called M in this case):

M=MRW.MolesDoc()

Now you can begin creating MOLES elements. You need to create them 'in reverse', so the lowest level elements are create first:

mdID=M.dgMetadataID(schemeIdentifier='1',repositoryIdentifier='2', localIdentifier='3')
dgMR=M.dgMetadataRecord(dgMetadataID=dgMID)
dgMeta=MRW.dgMetadata(dgMetadataRecord=dgMR)

So in the above there are 3 classes used: M.dgMetadataID, M.dgMetadataRecord and MRW.dgMetadata (the 'root' element). The resulting XML looks like this:

<dgMetadata>
   <dgMetadataRecord>
      <dgMetadataID>
        <schemeIdentifier>1</schemeIdentifier>
        <repostoryIdentifier>2</repositoryIdentifier>
        <localIdentifier>3</localIdentifier>
      </dgMetadataID>
   </dgMetadataRecord>
</dgMetadata>

So you can see that the hierarchical relationship expressed in python is represented in the XML. You don't neeed to remember all the class names, you can work them out by remembering that MOLES classes exist for elements that have child elements. If the element does not have children, it will not be a class, but will be an attribute of it's parent. So, in the following:

         <dgDataGranule>
            <dataModelID>
               <repositoryIdentifier>badc.nerc.ac.uk</repositoryIdentifier>
               <schemeIdentifier>NDG-A0</schemeIdentifier>
               <localIdentifier>example.xml</localIdentifier>
            </dataModelID>
         </dgDataGranule>

dgDataGranule will be a class as it has a child dataModelID, and dataModelID will be a class because it has 3 children, but the 3 identifiers won't be classes, they will only ever be attributes of the dataModelID.

Each MOLES class (e.g. dgMetadataRecord) is essentially the same apart from it's name. Each class can take any number of keyword arguments and there are some rules about how these are processed.

So the above snippet would be declared like:

dModel=M.dataModel(repositoryIdentifier='badc.nerc.ac.uk',schemeIdentifier='NDG-A0', localIdentifier='example.xml')
DG  = M.dgDataGranule(dataModelID=dModel)

There are three things you can pass as a keyword argument that affect how each MOLES element is processed. These are:

  • Another MOLES element
  • A string
  • A list

If the value of the keyword argument is another MOLES element then that element becomes a child element (with children of it's own) If the value of the keyword argument is a string, that becomes a single child element. If the value of the keyword argument is a list, then this list can contain multiple strings or multiple MOLES elements.

Using lists is necessary when you want a MOLES element to have multiple child elements with the same name. E.g. You may have a dgDataEntity with multiple dgDataGranules.

In this case you would do (in pseudo-python):

DG1 = M.dgDataGranule(att1 = 'A', att2='B')
DG2 = M.dgDataGranule(att1 = 'C', att2='D')
DG3 = M.dgDataGranule(att1 = 'E', att2='F')
dglist=[DG1, DG2, DG3)
DE = M.dgDataEntity(dgDataGranule=dglist)

This will produce:

<dgDataEntity>
   <dgDataGranule>
      <att1>A</att1>
      <att2>B</att2>
   </dgDataGranule>
   <dgDataGranule>
      <att1>C</att1>
      <att2>D</att2>
   </dgDataGranule>
   <dgDataGranule>
      <att1>E</att1>
      <att2>F</att2>
   </dgDataGranule>
<dgDataEntity>

Going to and from XML

Finally you need to know how to go to and from XML to python.

There are two methods, toXML() and fromXML(). You only need to call these methods on the root element. So to convert a MOLES XML document to python do this:

tree=cElementTree.ElementTree(file='moles.xml')
dgMeta=MRW.dgMetadata()
dgMeta.fromXML(tree.getroot())

And to convert to XML:

molestree=dgMeta.toXML()
print cElementTree.dump(molestree)

If you are just usign this parser to create MOLES records there may not be much advantage over using cElementTree directly, although it may be more or less readable depending on your preference. However the main advantage comes in being able to manipulate and edit existing MOLES documents.

So you could call the fromXML method on the root element of a document and then do things like:

for parameterSummary in dgMeta.dgMetadataRecord.dgDataEntity.dgDataSummary.dgParameterSummary:
        #do something to all the parameter summaries

Additional Notes

You can only pass strings to the parser, don't try and pass it any other type, eg. float etc.