Changeset 3373


Ignore:
Timestamp:
11/02/08 19:10:28 (12 years ago)
Author:
astephen
Message:
 
File:
1 edited

Legend:

Unmodified
Added
Removed
  • nappy/trunk/nappy/2008 refactoring notes.txt

    r3359 r3373  
    1 = Changes needed to make Nappy fit for purpose = 
    2  
    3  * Need error checking throughout - making it more robust but also checking on read will make Nappy a useful NASA Ames checker replacement. 
    4  
    5  * cdms interface should be in a separate part of the stack, not in core classes AND SUB-CLASSES 
    6  
    7  * Need to make an egg of it, but in the meantime we also need it installed as: 
    8  
    9 ------------------- 
    10 We need to decide what is required in terms of: 
    11  
    12     * actual formats 
    13     * mappings to/from NetCDF 
    14     * mapping to/from other formats (if required).  
    15  
    16     AS and CK have been analysing the required changes to make Nappy suitable for automated conversion of NetCDF-CF in DCIP to NASA Ames / CSV variant. 
    17  
    18         Here is an initial list of issues 
    19  
    20     * Put in the option to order variables explicitly, rather than relying on nasa_ames_var_number attributes. 
    21  
    22     * Support for 2110 – NX for the quickest changing IV needs to be converted to an auxiliary variable which does not fit nicely into the structure. In principle the second independent variable length changes at each first independent variable value but we may need it fixed. 
    23  
    24     * Refactoring to make it clearer and functions shorter but not too short. 
    25     * Add an option for the learning column. 
    26     * Proper use of FFI selection. 
    27     * In cdms2na.py, stopping filling an axis once the start and increment are known may not be enough. May need to populate all of the axis. 
    28     * Column headings for Excel users, this will clash with the current #End of normal comments# lines. 
    29     * Is it still right that the 4010 class is the same as 2010? 
    30     * Does comma separation option work correctly – any consequences for RDATE and DATE which retains some space separation? 
    31     * float rounding in nappy to get difference between independent variable values - and degree of accuracy (8.3f etc) 
    32     * cdms2na() needs argument "variable_order" that is a list of the order you want them to appear in 2110. 
    33     * Need sensible handling of rotated grid data where found (might be wrapper outside of nappy 
    34     * FFI 2110 is best for 2D columns for Excel etc. - Nappy must support this  
    35  
    36 -------------------- 
    37  
    38  
    39 Dear Ag Stephans 
    40  
    41 I found your nappy library for reading Ames Files at http://home.badc.rl.ac.uk/astephens/software/nappy/ 
    42 For one of my projects I want to use that lib in a plugin for a wiki (http://moinmo.in) to give some feedback about submitted/attached data files. 
    43  
    44 Your web page states last modified at Wed, 06 Apr 2005 13:44:53 GMT 
    45  As I have seen on http://home.badc.rl.ac.uk/astephens/software/nappy/USAGE.txt 
    46 "[NOTE: We plan to implement a getVariableArray(var_name) to grab a specific variable from the above array.]" 
    47  
    48 I like to ask if there is probably a further version of that lib available?  
    49  
    50 cheers 
    51 Reimar Bauer 
    52  
    53 ---------------------- 
    54  
    55  
    56 === Adding annotation column === 
    57  
    58 via argument: annotated=True|False 
    59  
    60 Only on output. So only affects write methods.  
    61 NOTE: could do read as well by reading in file and then removing column 1 (as long as we know the delimiter is a comma). But don't do this now. 
    62  
    63 Ask Charles to come up with a list of definitions for each column in a config file. 
    64  
    65 Need to defined names in config file and a ways of mapping to each of the lines by tagging to self.A, self.X self.XNAMES etc 
    66  
    67 === Adding CSV === 
    68  
    69 Via delimiter/spacer argument. Would be nice to have a writeCSV() method. 
    70  
    71 Should 'csv' and 'delim' args be sent to write methods rather than __init__()? 
    72 --- 
    73  
    74 === 
    75  
    76 Do we need global DEBUG = True|False 
    77  
    78 Then all the prints could be: 
    79  
    80 if DEBUG == True: print "blah" 
    81  
    82 =============== 
    83 localRules.py --> REMOVE COMPLETELY 
    84 localRules/blah - remove this and put stuff in a top-level config file, or even remove altogether. 
    85  
    86 localRules/aircraft.py - Need to consider how this can be a sub-class of Cdms2NA so that we push all the odd code into separate modules. Need to encapstulate the differences into one or two methods that are small and can be overridden. 
    87  * decided to dump all aircraft stuff in  an unsupported dir without refactoring in - it will probably never be used! 
    88  
    89 ================= 
    90  
    91 import nappy  (via nappy_api.py module) 
    92 nappy.convertNAToNC(na_file, nc_file) 
    93 nappy.convertNCToNA(nc_file, na_file) 
    94  
    95 ================ 
    96  
    97 Have I broken textParser.py's main function - is it same in old and new - need a test for it! 
    98  
    99 ===== 
    100  
    101 _readData[12] are crying out for useful names 
    102  
    103 Should we leave in the interactive time units checker in na_to_cdms.py - ask Charles 
    104  
    105 Unit tests 
    106 ========== 
    107  
    108 1001 appears to be done. Others are there as stubs but all need writing. 
    109  
    110 We then need a set of nc_interface tests as well! Need NetCDF files to convert the other way as well. Need to be small. 
    111  
    112 ---- 
    113  
    114 === Current state of Unit tests === 
     1= Re-factoring !NAppy - NASA Ames Processing in Python - 20080210 = 
     2 
     3 
     4== This page == 
     5 
     6This page attempts to document to provide a semi-detailed view of the re-factoring process but is by no means complete. 
     7 
     8== Overview of re-factoring == 
     9 
     10This page presents a description of changes made to the !NAppy (hereafter ''nappy'') python package in the re-factoring process undertaken in February 2008 to improve NAppy as follows: 
     11 
     12 * Re-factor entire code-base to remove any over-sized modules/classes/functions. 
     13 * Re-structure code into more sensibly named and organised modules and packages. 
     14 * Move all the code used to interact with NetcDF files (via the external CDMS python package) into its own sub-package (rather than including as parent classes of main NASA Ames file classes). 
     15 * Remove all the ad-hoc code written to support FAAM aircraft data (which we do not believe is being used by anyone) and place in a ''contrib/'' package that is not supported. 
     16 * Re-name all variables within code to use common style convention. 
     17 * Push all non-python components into external directories. 
     18 * Create a ''nappy_api.py'' module (pronounced ''nappy appy'') that contains the simplest API that most users will want. 
     19 * Re-factor the two command-line scripts to allow more arguments to be passed and better encapsulation. 
     20 * Create unit tests for all the main read, write and convert functions that fully test the outputs are correct etc. 
     21 * Create enough error checking during file reading to allow NAppy to be used as a format-conformance checker. 
     22 * Allow writing of CSV files - i.e. replacing the existing (space or tab) spacer with commas. 
     23 * Allow writing of an ''annotated'' format which includes an additional column on the left-hand edge of the file that explains, in human-readable terms, what that line contains. 
     24 * Development of a nappy ''egg'' to allow easy installation (would be dependent on cdat_lite (another egg)). 
     25 
     26'''NOTE: We do NOT intend to make this version of nappy backwards-compatible with any previous version.''' 
     27 
     28 
     29=== More specific requirements from DCIP Project === 
     30 
     31Here are the key requirements from the DCIP project: 
     32 
     33 * Put in the option to order variables explicitly, rather than relying on nasa_ames_var_number attributes. 
     34 * Support for NetCDF to FFI 2110 – NX for the quickest changing IV needs to be converted to an auxiliary variable which does not fit nicely into the structure. In principle the second independent variable length changes at each first independent variable value but we may need it fixed. 
     35 * Support for conversion to/from NetCDF for 2130 and 2160? (not for DCIP). 
     36 * Addition of annotated first column. 
     37 * Proper use of FFI selection - is this needed really? 
     38 * In cdms2na.py, stopping filling an axis once the start and increment are known may not be enough. May need to populate all of the axis. 
     39 * Column headings for Excel users, this will clash with the current #End of normal comments# lines. 
     40 * Is it still right that the 4010 class is the same as 2010? 
     41 * Does comma separation option work correctly – any consequences for RDATE and DATE which retains some space separation? 
     42 * float rounding in nappy to get difference between independent variable values - and degree of accuracy (8.3f etc) 
     43 * Need sensible handling of rotated grid data where found (might be wrapper outside of nappy 
     44 
     45=== Other (perceived?) user requirements === 
     46 
     47Other users have also asked questions such as: 
     48 
     49 * Users also want: getVariableArray(var_name) to grab a specific variable. Would need a cdms-like variable class. User has been asked to give more details. 
     50 
     51=== Unit tests === 
     52 
     53To make sure this software actually works we need a unit test suite. This means writing a test class for all the major pieces of functionality exposed by nappy. 
     54 
     55 
     56'''Current state of Unit tests''' 
    11557 
    11658Successful: 
     59 
    11760 * test_na_file_1001.py 
    11861 * test_na_file_1010.py 
     
    12265 
    12366Failed: 
     67 
    12468 * test_na_file_1020.py 
    12569 * test_na_file_2160.py 
     
    12771 * test_na_file_4010.py 
    12872 
    129 Did they work before in old nappy. 
     73Did the code work before in old nappy? 
    13074 
    13175Old nappy is available at: 
     
    13579Get it with: 
    13680 
     81{{{ 
    13782svn co svn+ssh://proj.badc.rl.ac.uk/svn/ndg/nappy/tags/nappy_pre_refactor_feb2008/nappy 
    138  
    139  
    140 All need to compare exactly to old output from old nappy! 
     83}}} 
     84 
     85Extending the unit tests: 
     86 
     87 * any doing file read/writes should do a diff of the input and output file and compare them: 
     88   * note that some differences might be only cosmetic and the content is essentially the same 
     89 
     90 * we need a unit test for all conversions to CDMS objects, NetCDF, and CSV. 
     91 
     92 
     93=== Adding annotation column === 
     94 
     95Suggest that we add the annotation column in the following way: 
     96 
     97 * new argument added to NA file classes: 
     98   * annotated=True|False 
     99 * only provide annotation column on output to avoid any confusion with trying to read in first column where delimiter is not comma. Comma is easy to do but space or tab would be almost impossible to implement sensibly. 
     100 * if output only then only need to re-factor write methods (header and data). 
     101 * NOTE: could do read as well by reading in file and then removing column 1 (as long as we know the delimiter is a comma). But don't do this now. 
     102 
     103We need a defintion of what each row means in a simple configuration file that maps the first item to : 
     104 
     105{{{ 
     106[common_header] 
     107DX = Interval between coordinate variable values (for coordinate variables 1, 2,...n) 
     108}}} 
     109 
     110Need to define names in config file and a ways of mapping to each of the lines by tagging to self.A, self.X self.XNAMES etc. 
     111 
     112=== Adding output to CSV === 
     113 
     114We need to allow file-writing to CSV format, as follows: 
     115 
     116 * rename "spacer" argument to "delimiter" throughout code. 
     117 * needs to be implemented in both header and body for consistency. 
     118 * BUT, it would be nice to have a writeCSV() method as well: 
     119   * The API should probably expose simple methods like naToCSV() and ncToCSV() etc. 
     120 
     121=== Minor changes required === 
     122 
     123The config file should include a global DEBUG = True|False. Then any print statements lying around and the message stuff in cdms_to_na.py can all be controlled by that. 
     124 
     125E.g. 
     126 
     127{{{ 
     128if DEBUG == True: print "blah" 
     129}}} 
     130 
     131=== Re-factoring: detailed notes === 
     132 
     133 * The following modules were removed and the information added to the main "nappy.ini" configuration file, accessed by nappy.utils.getConfigDict(): 
     134   * localRules.py 
     135   * version.py 
     136 
     137 * localRules package removed and information put in config file. 
     138 * localRules/aircraft.py - moved out to contrib/aircraft/ - no longer supported! 
     139 
     140 
     141=== Making a clean API === 
     142 
     143We have added a top level API module called nappy_api.py which is imported automatically when you import nappy. This provides the "public" interface to the package. It might typically be used as follows: 
     144 
     145{{{ 
     146import nappy  (via nappy_api.py module) 
     147nappy.convertNAToNC(na_file, nc_file) 
     148nappy.convertNCToNA(nc_file, na_file) 
     149}}} 
     150 
     151 
     152=== Broken? === 
     153 
     154Have I broken textParser.py's main function - is it same in old and new - need a test for it! 
     155 
     156=== Questions === 
     157 
     158 * Should we leave in the interactive time units checker in na_to_cdms.py? Seems a bit silly to have interactive code in middle of conversion script (potentially called by other process). 
     159 
     160=== Where is the new code? === 
     161 
     162Need to move to DCIP repository and update NDG page about that. 
    141163 
    142164---- 
     
    146168svn co svn+ssh://proj.badc.rl.ac.uk/svn/ndg/nappy/trunk 
    147169 
    148 ------- 
    149  
    150 = Changes needed to make Nappy fit for purpose = 
    151  
    152  
    153  * Need error checking throughout - making it more robust but also checking on read will make Nappy a useful NASA Ames checker replacement. 
    154  
    155  * cdms interface should be in a separate part of the stack, not in core classes AND SUB-CLASSES 
    156  
    157  * Need to make an egg of it, but in the meantime we also need it installed as: 
     170=== New structure === 
    158171 
    159172   * nappy-0.2.3 
     
    165178     * nappy/na_file 
    166179     * nappy/contrib/aircraft 
    167      
    168  
    169  * GET REST FROM WIKI! 
    170  
    171  
    172 =============== 
    173 localRules.py --> REMOVE COMPLETELY 
    174 localRules/blah - remove this and put stuff in a top-level config file, or even remove altogether. 
    175  
    176 localRules/aircraft.py - Need to consider how this can be a sub-class of Cdms2NA so that we push all the odd code into separate modules. Need to encapstulate the differences into one or two methods that are small and can be overridden. 
     180    
    177181 
    178182====== 
     
    203207CDMS stuff is most of the mess 
    204208============================== 
     209 
     210 
     211Renamed some of: 
    205212 
    2062131. naToCdms.py holds: 
     
    253260================= 
    254261 
     262'''naToCdms.py''' 
     263 
     264The naToCdms.py module has been re-factored by  
    255265naToCdms.py 
    256266=========== 
     
    274284================ 
    275285 
    276 Have I broken textParser.py's main function - is it same in old and new. 
    277  
    278 ===== 
    279  
    280 NAFile2010: 
    281 _readData[12] are crying out for useful names 
    282  
    283286GREP 
    284287==== 
     
    292295Global find and replace: 
    293296 
    294 floatFormat 
    295 naDict 
    296 ===================== 
    297  
    298 In naToCdms.py.NAToCdms.toCdmsAxis() there is a line naming the id (if too long) as: 
    299  
    300 naAuxVariable.... 
    301  
    302  - need to find where else this string is used and replace all with "naIndVariable" with map-back! 
    303  
    304 =================== 
    305  
    306297Should we leave in the interactive time units checker in na_to_cdms.py - ask Charles 
    307298 
    308299===== 
    309300cdms_map is not all done in the config file dict. 
    310  
    311 Unit tests 
    312 ========== 
    313  
    314 1001 appears to be done. Others are there as stubs but all need writing. 
    315  
    316 We then need a set of nc_interface tests as well! Need NetCDF files to convert the other way as well. Need to be small. 
    317  
    318 =============== 
    319  
    320 = Changes needed to make Nappy fit for purpose = 
    321  
    322  
    323  * Need error checking throughout - making it more robust but also checking on read will make Nappy a useful NASA Ames checker replacement. 
    324  
    325  * cdms interface should be in a separate part of the stack, not in core classes AND SUB-CLASSES 
    326  
    327  * Need to make an egg of it, but in the meantime we also need it installed as: 
    328  
    329    * nappy-0.2.3 
    330      * nappy 
    331      * bin 
    332      * nappy/nc_interface 
    333      * nappy/cdms_utils/ 
    334      * nappy/utils 
    335      * nappy/na_file 
    336      * nappy/contrib/aircraft 
    337      
    338  
    339  * GET REST FROM WIKI! 
    340  
    341  
    342 =============== 
    343 localRules.py --> REMOVE COMPLETELY 
    344 localRules/blah - remove this and put stuff in a top-level config file, or even remove altogether. 
    345  
    346 localRules/aircraft.py - Need to consider how this can be a sub-class of Cdms2NA so that we push all the odd code into separate modules. Need to encapstulate the differences into one or two methods that are small and can be overridden. 
    347  
    348 ====== 
    349 bin/scanFAAM.py - put in contrib 
    350  
    351 ====== 
    352  
    353 version.py - put in config file. 
    354  
    355 ====== 
    356  
    357 general.py --> call it utils/xxxxx.py 
    358  
    359 textParser --> utils/text_parser.py 
    360  
    361 naError.py --> na_error/na_error.py 
    362  
    363 naCore.py --> na_file/na_core.py 
    364  
    365 listManipulator -_> utils/list_manipulator.py 
    366  
    367 cdmsMap.py --> put in config file given simplicity 
    368  
    369 Need utils/parse_config.py 
    370  
    371 ====== 
    372  
    373 CDMS stuff is most of the mess 
    374 ============================== 
    375  
    376 1. naToCdms.py holds: 
    377  
    378 AbstractNAToCdms CLASS 
    379 toCdmsFile 
    380 createCdmsVariables 
    381 toCdmsVariable 
    382 createCdmsAuxVariables 
    383 auxToCdmsVariable 
    384 createCdmsAxes 
    385 toCdmsAxes 
    386  
    387 2. na2cdms.py: 
    388  
    389 Command-line script 
    390  
    391 3. bin/na2nc: 
    392  
    393 Same as na2cdms.py ??? 
    394  
    395 4. cdms2na.py is the mother of all modules: 
    396  
    397 compareAxes --> areAxesIdentical(a,b) cdms_utils 
    398 compareVariables --> areDomainsIdentical(v1, v2) cdms_utils 
    399 isAuxAndVar --> isAuxVarAndVar  
    400 arrayToList utils 
    401 listOfListsCreator utils 
    402 getBestName cdms_utils - need some advice and compare with Dom 
    403 getMissingValue cdms_utils 
    404 fixHeaderLengthNowDefunct # Can destroy 
    405 flatten2DTimeData aircraft 
    406 modifyNADictCopy - needs a better name as it is specific 
    407 cdms2na - 200 lines of code to do main conversion, needs to be split out into other stuff. 
    408  * getVariableCollections(f and varlist) --> (ordered_vars, other_vars) 
    409  * buildNADicts() 
    410  * writeToOutputFiles() 
    411  
    412 class CdmsToNABuilder --> NAContentCollector: (naDict, varIDs, varBin) 
    413 __init__ --> sets everything up and runs it move some to --> analyse() 
    414 analyseVariables 
    415 defineNAVars 
    416 defineNAAuxVars 
    417 getAxisDefinition 
    418 defineNAGlobals 
    419 defineNAComments 
    420 defineGeneralHeader 
    421 _useLocalRule --> Remove this and put it all in aircraft contrib bit 
    422  
    423 ================= 
    424  
    425 naToCdms.py 
    426 =========== 
    427  
    428 This is a sub-class of all NAFile objects. Bad idea. What we need is to: 
    429  
    430 import convertor 
    431 convertor.writeToNC(blah) 
    432 convertor.convertToCdms(blah): (vars, global_atts) 
    433  
    434 class NAToCdms 
    435  
    436 toCdmsFile 
    437 createCdmsVariables - does all 
    438 toCdmsVariable - does each in turn 
    439 CreateCdmsAuxVariables - does all 
    440 auxToCdmsVariables - does each in turn 
    441 createCdmsAxes - does all 
    442 toCdmsAxes - does each in turn 
    443  
    444 ================ 
    445  
    446 Have I broken textParser.py's main function - is it same in old and new. 
    447  
    448 ===== 
    449  
    450 NAFile2010: 
    451 _readData[12] are crying out for useful names 
    452  
    453 GREP 
    454 ==== 
    455  
    456 Need to do a lot of grepping for inconsistencies. 
    457  
    458  * _normalizedX should be True|False not "yes","no" 
    459  
    460 Global find and replace: 
    461  
    462 floatFormat 
    463 naDict 
Note: See TracChangeset for help on using the changeset viewer.