wiki:qxfcsml

Version 4 (modified by domlowe, 12 years ago) (diff)

Formatting

Notes on using QXF data with CSML

The following issues arose out of a meeting between Siva and Dom on 19th April 2007.

General Notes

Siva and I attempted to hand code some CSML v2 features for QXF data. We completed a point series feature and a ragged section feature (almost).

The feature types themselves fitted perfectly with the data on a conceptual level and there were no problems there. However there are a few issues worth mentioning, some of which may need wider consideration.

Specific Issues

  1. Each phenomenon observed (temperature, salinity etc) needs to modelled as a separate feature, even if they share the same domain. There is the concept of a feature having multiple phenomena but that only applies for composite phenomena e.g. u,v winds. So a single dataset containing say 5 variables will be modelled as 5 features. This actually helps the case where there is 'gappy' data, i.e. some variables are not measured at all times as the individual features in the dataset may have different domains (i.e. different times/depths). The data granule (the csml document) contains all 5 features, so at the UI level they are logically grouped together.
  1. Pointing to time in a CSML storage descriptor:

The QXF files we were looking at don't have a single netcdf variable for the time dimension, so you can't point to a single time dimension with a storage descriptor like this:

<NetCDFExtract id="CRoKr942">
   <arraySize>1</arraySize>
   <fileName>blah.qxf</fileName>
   <variableName>time_variable</variableName>
</NetCDFExtract>

Instead the times are stored in two variables. Note, these are not dimensions, there is just a primary dimension for the data, and then all 'variables' are against this dimension. So date, time, temperature etc are all against the primary dimension, rather than measuring temperature against the time (+space) dimension as in CF.

As shown below, there is a variable for Date ( AADYAA01) and one for time of day ( AAFDZZ01) The date is expressed in days since 1760. (I think that's the right data Siva? Anyway, doesn't really matter for these notes) and the time of day is also expressed in units day so midday would therefor be equal to 0.5 – or thereabouts). This differs from the CF approach.

dimensions:
        primary = UNLIMITED ; // (8784 currently)
variables:
        int AADYAA01(primary) ;
                AADYAA01:MIN = 77431 ;
                AADYAA01:MAX = 77796 ;
                AADYAA01:ABS = -1 ;
                AADYAA01:LFM = 500 ;
        float AAFDZZ01(primary) ;
                AAFDZZ01:MIN = 0.f ;
                AAFDZZ01:MAX = 0.9583333f ;
                AAFDZZ01:ABS = -1.f ;
                AAFDZZ01:LFM = 206 ;
        char FAAFDZZ01(primary) ;
                FAAFDZZ01:MIN = " " ;
                FAAFDZZ01:MAX = " " ;
        float ASLVZZ01(primary) ;
                ASLVZZ01:MIN = 2.834674f ;
                ASLVZZ01:MAX = 7.040967f ;
                ASLVZZ01:ABS = -99.f ;
                ASLVZZ01:LFM = 303 ;
        char FASLVZZ01(primary) ;
                FASLVZZ01:MIN = " " ;
                FASLVZZ01:MAX = " " ;

Anyway the main point being that the times in the file are represented by two variables, so it isn't possible to dynamically point to a *single* variable or dimension in the file and extract a list of times equivalent to the type required by CSML e.g 2006-12-01T09:30:00.0 2006-12-01T10:30:00.0 ...

To use a storage descriptor to describe these times it would be necessary to be able point to a single variable or dimension. So the alternative it to store the full times inline in the CSML document. This means calculating these times from AADYAA01 + AAFDZZ01 at the time of creating CSML.

This is a valid approach, my only concern is that there are many values – there were 32000 times in the example tide gauge dataset we looked at.

This in itself isn't likely a problem for the parser in terms of efficiency (it will have to read those 32k times anyway either from the XML or the file and I don't think there's much between the two approaches.) However it may be a problem in terms of writing nice concise CSML documents. However unless these limitations in the storage descriptor are resolved (unlikely in NDG timescale) then I think storing the times explicitly inline is the best (only) approach that will get QXF working under CSML now.

This in itself did raise a couple of other issues though (addressed below).

  1. Slicing. I've been working on the (flawed) assumption that we can use the dimensions in netcdf to define subsets of data i.e. we can select temperatures at time range tmin-tmax where t is the time dimension and temperature is a variable in the time dimension: i.e. temperature(time)

So it's apparent that this approach won't work for QXF, so I'll need to 'slice' the temperature to get say temperature at positions 100 – 500 in the file. Andrew's been telling me we need to implement this anyway... but I'll have to bring it forward to fully support QXF.

  1. Weird cdms 'bug'.

We did some tests using cdtime to convert the QXF time values to full component times, and one of them produced this odd result:

import cdtime
t=cdtime.reltime(77431.04166666, "days since 1760-1-1")
t.tocomp()

>>> 1972-1-1 0:59:60.0

surely this should be: 1972-1-1 1:00:00.0 ?

Dom to investigate.

  1. I recall that cdms default time units are “days since 1979-1-1”.

Is it likely to be problematic that BODC is working from the 18th century as reference point? Are there limitations in cdtime we should know about?

Roy comment: BODC has data going back to the 19th century. Hence our choice of time base. Instinct tells me this might be a problem if the cdms time base is 1979 - or are negative times handles correctly/transparently.

Dom comment: Not sure I'll look into this.

  1. Bounding boxes:

The GML GridEnvelope specifies integer as the data type for min max values in bounding boxes, so the following is invalid:

<limits>
	<gml:GridEnvelope>
		<gml:low>-15.14235 25.91275</gml:low> 
		<gml:high>16.14235 45.91275</gml:high> 
	</gml:GridEnvelope>
</limits>

Is the extra accuracy needed for NDG or can we round the above to:

<limits>
	<gml:GridEnvelope>
		<gml:low>-15 25</gml:low> 
		<gml:high>17 46</gml:high> 
	</gml:GridEnvelope>
</limits>

Roy comment: Potentially 'growing' bounding boxes due to type transforms by up to 2 degrees worries me. Dom comment: The alternative could be to define a  csml:GridEnvelope which accepts floats, but I'm not sure if this would break the schema from a gml point of view? Andrew?

  1. What is low and what is high? I know we've discussed this before, but I'm not sure we've agreed on conventions in CSML for the directions of up and down relating to depths. (edited from original discussion note where I had an incorrect grid envelope as an example.

Roy comment: Oceanographers have depth bred into them, but for MOLES we agreed that depths should be considered as negative heights and I suggest CSML follows the same convention. Trouble is what to do when the z co-ordinate is expressed in terms in units of pressure.....