wiki:CSMLReadMethods

Version 7 (modified by domlowe, 13 years ago) (diff)

formatting edit

Read methods needed to integrate a new data format into the CSML API

The CSML API uses a unified 'data interface' class (DI) to read various data formats. To see how this fits in with the CSML parser etc here is a diagram showing an overview of CSML tooling:

Overview of CSML tooling

The basic read methods that need implementing for a new data format are:

  • DI.openFile(self, fileName) --- opens the file
  • DI.setAxis(self,axisName) --- this 'sets' the axis you want to read (axis: e.g latitude, time, pressure, depth.. etc)
  • DI.getDataForAxis(self) --- this returns the entire set of values for that axis
  • DI.setVariable(self,variableName) --- this 'sets' the variable you want to read (variable: e.g Temperature, WindSpeed etc..)
  • DI.getDataForVar(self,) --- this returns the entire set of values for that variable
  • DI.getSubsetOfDataForVar(self,kwargs) --- this returns a subset of values for that variable
  • DI.closeFile(self) --- closes the open file

When the CSML API instantiates a DataInterface object (from now on, DI), what is actually returned is a data interface specific to the data format.

In the DataInterface class there is a bit of python code that does something like this:

                if self.iface == 'nappy':
                        return NappyInterface()
                elif self.iface == 'cdunif':
                        return cdunifInterface()

So if you want to integrate your format, XYZFormat, the first thing to do is to create an XYZInterface() and we can then have:

                if self.iface == 'nappy':
                        return NappyInterface()
                elif self.iface == 'cdunif':
                        return cdunifInterface()
                elif self.iface == 'XYZ':
                        return XYZInterface()

So (in python) you should create a class that looks like this:

class XYZInterface(AbstractDI):
    #Data Interface for XYZ File format

    def __init__(self):
        #this might change when CSML is revamped
        self.extractType='XYZExtract'
        self.extractPrefix = '_XYZextract_'

    def openFile(self, filename):
        #some code to open the file

    def setAxis(self,axis):
        #some code to set an axis to be queried, may not need to do much, depending on your format

    def getDataForAxis(self):
        #some code to return the values for an axis
        return data

    def setVariable(self,varname):
        #some code to set a variable to be queried, may not need to do much, depending on your format


    def getDataForVar(self):
        #some code to return all values for a variable
        return data

    def getSubsetOfDataForVar(self, **kwargs):
        #takes keyword args defining subset eg
        #subset=getSubsetOfDataForVar(latitude=(0.,10.0), longitude=(90, 100.0), ...)
        #and returns a subset of data for tha variable 
        return data

    def closeFile(self):
        #some code to close the file

Example Data Interfaces

I think perhaps it is best to explain the XYZInterface() by showing how the interface differs for cdms/cdunif and NAPPY data interfaces. The details of each API aren't important to understand, rather it is the structure of the DataInterface that I am trying to illustrate.

So here are the methods for two different data interfaces, cdfunifInterface() and NappyInterface?(). First, the openFile method. This is pretty straightforward, we open the file and assign the open file to self.file.

CDMS: openFile

        def openFile(self, filename):
                self.file=cdms.open(filename)

NAPPY: openFile

        def openFile(self, filename):
                self.file=nappy.openNAFile(filename)

The set axis method differs for the two interfaces. The cdunif method is straightforward and grabs an axis object direct from the file whereas the NAPPY method stores the name of the axis in a variable called self.axisstub for reference later - however to do this it has to get all the axes and then strip the units from them. This is confusing detail but the basic idea is to store 'something' that will give you a handle back to the axis. This something will be internal to the XYZInterface class.

CDMS: setAxis

    def setAxis(self,axis):
        self.axisobj=self.file.getAxis(axis)

NAPPY: setAxis

        def __stripunits(self,listtostrip):
                #strips units of measure from list
                #eg ['Universal time (hours)', 'Altitude (km)', 'Latitude (degrees)', 'Longitude (degrees)']
                #becomes ['Universal time', 'Altitude', 'Latitude', 'Longitude']
                cleanlist = []
                for item in listtostrip:
                        openbracket=string.find(item,'(')
                        if openbracket != -1:
                                #if brackets exist, strip units.
                                item=item[:openbracket-1]
                        cleanlist.append(item)
                return cleanlist


        def __getListOfAxes(self):
                axes=self.file.XNAME
                axes=self.__stripunits(axes)
                return axes

        def setAxis(self,axis):
                axes = self.__getListOfAxes()
                self.axisstub=axes.index(axis)

Now the 'handle' to the axis is used internally within getDataForAxis.

CDMS: getDataForAxis

        def getDataForAxis(self):
                data = self.axisobj.getValue()
                return data

NAPPY: getDataForAxis

        def getDataForAxis(self):
                #this is a Nappy thing - it needs to call the readData() method if it hasn't already done so  
                if self.file.X == None:
                        self.file.readData()
                #if more than one axis you need to specify which one you want (using self.axisstub)
                if type(self.file.X[1])==list:
                        data = self.file.X[self.axisstub]
                else:
                        data =self.file.X
                return data

setVariable and getDataForVariable work in a similar way to the axis methods just shown.

CDMS: setVariable

    def setVariable(self,varname):
        self.varobj=self.file.variables[varname]

NAPPY: setVariable

        def setVariable(self,varname):
                vlist=self.getListofVariables()
                self.varstub=vlist.index(varname)

CDMS: getDataForVar

    def getDataForVar(self):
        data = self.varobj.getValue()
        return data

NAPPY: getDataForVar

        def getDataForVar(self):
                if self.file.V == None:
                        self.file.readData()
                try:
                    if type(self.file.V[1])==list:
                        data = self.file.V[self.varstub]
                    return data
                except:
                    data = self.file.X
                    return data

Getting subsets of data is something that may or may not be complicated. CDMS has subsetting built-in so we just call the CDMS methods.

CDMS: getSubsetOfDataForVariable

    def getSubsetOfDataForVar(self, **kwargs):
        #takes keyword args defining subset eg
        #subset=getSubsetOfDataForVar(latitude=(0.,10.0), longitude=(90, 100.0))
        sel=cdms.selectors.Selector(**kwargs)
        subset=self.file(self.varobj.name,sel)
        #data = subset.getValue()
        data = subset  #doesn't seem to matter which
        return data

Nappy on the other hand doesn't so we need to do the subsetting within the NappyInterface class itself.

NAPPY: getSubsetOfDataForVariable

    def getSubsetOfDataForVar(self, **kwargs):
        #Hmmm I haven't implemented this yet.
        #Either the subsetting needs to happen in NAPPY (which it doesn't)
        #Or I need to write some code here to handle the subsetting

And finally the closeFile method. Both cdunifInterface(AbstractDI) and NappyInterface?(AbstractDI) inherit the generic close file method from AbstractDI().

AbstractDI: closeFile

        def closeFile(self):
                #closes file
                self.file.close()

There is a bit more to all this in terms of integrating it into the main trunk of the CSML code, but the first stage is to work on these methods for your file format.

Attachments