Version 1 (modified by astephen, 12 years ago) (diff)


DX Re-use – Components of the Data Extractor that might be useful in other tools

Here is a list of code objects that you might be able to re-use in the Data Extractor. Please ask me (Ag) is you want more information on any of this:

SessionObject?.py A persistent session manager. Saves session into a dictionary-like object (using python’s “shelve” module).

DateTimeManager?.py Useful class and functions for generating long lists of dates/times from a basic constructor such as:

[[[ createList((1999,1,1,0), (2007,2,1,18), (3, “hour”)) ]]]

Also allows you to add a unit of time to an existing DateTime? object such as:

dt.add(12, “days”)

FileNames?.py Generates a list of file names based on some configuration information such as Dataset Group, Dataset, Variable, Date, Time, Domain, Format etc.

OptionHandler?.py Evaluates what the session currently contains and then returns appropriate “next” set of user options based on a pre-defined hierarchy of selections.

dxvv – Virtual Variable handler Method of providing the DX with a definition of “virtual variables” that are generated on the fly from existing variables. This requires data to be held in cdms-style objects.

Bits of functionality in the DX that might/will be needed in a replacement tool

The following list presents functionality that the DX has partially or totally solved, it might be worth looking to reuse this code:

  • newvar = difference(var1, var2)
  • (size, duration) = evaluateCostOfExtraction(session)
  • fork and queue task
  • chop up large task into numerous smaller output files (by time step)
  • mail user when complete
  • save/restore session (not currently working but framework there)
  • javascript GUI logic for sorting out interface
  • logging
  • NASA Ames output (link to NAppy)
  • GRIB output (wrapping ECMWF fortran executables)

And some more with one line docs:


(year, month, day, hour, minute, second) = getDateTimeComponents(dateTimeString) # Takes in a time string in standard DateTime? format and returns the items in it.

1|0 = keyPatternMatch(dct, pattern, mode="string match") # Returns 1 if one or more keys in the dictionary 'dct' match the pattern provided using string.find(). Returns 0 otherwise. inRangeArray = getValuesInRange(start, end, array) # Takes a start and end value and returns the values in the array that are between them. If not in range and are the same value then returns [start].

sortedKeyList = getSortedKeysLike(dct, pattern, mode="string match") # Returns a list of all keys in the dictionary 'dct' that do a string match on 'pattern'.

dictSubset = getDictSubsetMatching(dct, pattern, mode="string match") # Returns a dictionary of all items in input dictionary 'dct' where keys match 'pattern'. deletedCount = deleteDictSubsetMatching(dct, pattern, mode="string match") # Deletes any items in the dictionary subset matched when calling getDictSubsetMatching above. Returns the number of items deleted. timeString = convertDateTimeStringToYYYYMMDDHH(timeString) # Takes in a long CF-compliant time string and returns a shorter YYYYMMDDHH string.

cleanedObject = deUnicodeObject(obj) # Returns the object identical except unicode strings are all converted to normal strings.

0|1 = compareCdmsAxes(ax1, ax2) # Takes 2 cdms axis objects returning 1 if they are essentially the same and 0 if not.

0|1 = compareGrids(grid1, grid2): # Takes 2 cdms grid objects returning 1 if they are essentially the same and 0 if not.

(newValue, newValue, rtMessage) = nudgeSingleValuesToAxisValues(value, axisValues, axisType) # Takes a value and checks if it is in the axisValues array. If not, it nudges the value to the nearest neighbour in axis. It returns the new value twice along with a message describing the change.

makeDirsAndPerms(basedir, dirs, permissions, owner, verbose="no") # A function for making directories recursively and setting permissions/ownership.

class RedirectStdout?: # RedirectStdout? class - used to direct standard output away from the screen in CGI scripts.