source: nappy/trunk/2008 refactoring notes.txt @ 3396

Subversion URL: http://proj.badc.rl.ac.uk/svn/ndg/nappy/trunk/2008 refactoring notes.txt@3546
Revision 3396, 9.8 KB checked in by astephen, 12 years ago (diff)
Line 
1= Re-factoring !NAppy - NASA Ames Processing in Python - 20080210 =
2
3
4== This page ==
5
6This page attempts to document to provide a semi-detailed view of the re-factoring process but is by no means complete.
7
8== Overview of re-factoring ==
9
10This page presents a description of changes made to the !NAppy (hereafter ''nappy'') python package in the re-factoring process undertaken in February 2008 to improve NAppy as follows:
11
12 * Re-factor entire code-base to remove any over-sized modules/classes/functions.
13 * Re-structure code into more sensibly named and organised modules and packages.
14 * Move all the code used to interact with NetcDF files (via the external CDMS python package) into its own sub-package (rather than including as parent classes of main NASA Ames file classes).
15 * Remove all the ad-hoc code written to support FAAM aircraft data (which we do not believe is being used by anyone) and place in a ''contrib/'' package that is not supported.
16 * Re-name all variables within code to use common style convention.
17 * Push all non-python components into external directories.
18 * Create a ''nappy_api.py'' module (pronounced ''nappy appy'') that contains the simplest API that most users will want.
19 * Re-factor the two command-line scripts to allow more arguments to be passed and better encapsulation.
20 * Create unit tests for all the main read, write and convert functions that fully test the outputs are correct etc.
21 * Create enough error checking during file reading to allow NAppy to be used as a format-conformance checker.
22 * Allow writing of CSV files - i.e. replacing the existing (space or tab) delimiter with commas.
23 * Allow writing of an ''annotated'' format which includes an additional column on the left-hand edge of the file that explains, in human-readable terms, what that line contains.
24 * Development of a nappy ''egg'' to allow easy installation (would be dependent on cdat_lite (another egg)).
25
26'''NOTE: We do NOT intend to make this version of nappy backwards-compatible with any previous version.'''
27
28
29=== More specific requirements from DCIP Project ===
30
31Here are the key requirements from the DCIP project:
32
33 * Put in the option to order variables explicitly, rather than relying on nasa_ames_var_number attributes.
34 * Support for NetCDF to FFI 2110 – NX for the quickest changing IV needs to be converted to an auxiliary variable which does not fit nicely into the structure. In principle the second independent variable length changes at each first independent variable value but we may need it fixed.
35 * Support for conversion to/from NetCDF for 2130 and 2160? (not for DCIP).
36 * Addition of annotated first column.
37 * Proper use of FFI selection - is this needed really?
38 * In cdms2na.py, stopping filling an axis once the start and increment are known may not be enough. May need to populate all of the axis.
39 * Column headings for Excel users, this will clash with the current #End of normal comments# lines.
40 * Is it still right that the 4010 class is the same as 2010?
41 * Does comma separation option work correctly – any consequences for RDATE and DATE which retains some space separation?
42 * float rounding in nappy to get difference between independent variable values - and degree of accuracy (8.3f etc)
43 * Need sensible handling of rotated grid data where found (might be wrapper outside of nappy
44
45=== Other (perceived?) user requirements ===
46
47Other users have also asked questions such as:
48
49 * Users also want: getVariableArray(var_name) to grab a specific variable. Would need a cdms-like variable class. User has been asked to give more details.
50
51=== Unit tests ===
52
53To make sure this software actually works we need a unit test suite. This means writing a test class for all the major pieces of functionality exposed by nappy.
54
55
56'''Current state of Unit tests'''
57
58Successful:
59
60 * test_na_file_1001.py
61 * test_na_file_1010.py
62 * test_na_file_2010.py
63 * test_na_file_2110.py
64 * test_na_file_2310.py
65
66Failed:
67
68 * test_na_file_1020.py
69 * test_na_file_2160.py
70 * test_na_file_3010.py
71 * test_na_file_4010.py
72
73Did the code work before in old nappy?
74
75Old nappy is available at:
76
77http://proj.badc.rl.ac.uk/ndg/browser/nappy/tags/nappy_pre_refactor_feb2008/nappy
78
79Get it with:
80
81{{{
82svn co svn+ssh://proj.badc.rl.ac.uk/svn/ndg/nappy/tags/nappy_pre_refactor_feb2008/nappy
83}}}
84
85Extending the unit tests:
86
87 * any doing file read/writes should do a diff of the input and output file and compare them:
88   * note that some differences might be only cosmetic and the content is essentially the same
89
90 * we need a unit test for all conversions to CDMS objects, NetCDF, and CSV.
91
92
93=== Adding annotation column ===
94
95Suggest that we add the annotation column in the following way:
96
97 * new argument added to NA file classes:
98   * annotated=True|False
99 * only provide annotation column on output to avoid any confusion with trying to read in first column where delimiter is not comma. Comma is easy to do but space or tab would be almost impossible to implement sensibly.
100 * if output only then only need to re-factor write methods (header and data).
101 * NOTE: could do read as well by reading in file and then removing column 1 (as long as we know the delimiter is a comma). But don't do this now.
102
103We need a defintion of what each row means in a simple configuration file that maps the first item to :
104
105{{{
106[common_header]
107DX = Interval between coordinate variable values (for coordinate variables 1, 2,...n)
108}}}
109
110Need to define names in config file and a ways of mapping to each of the lines by tagging to self.A, self.X self.XNAMES etc.
111
112=== Adding output to CSV ===
113
114We need to allow file-writing to CSV format, as follows:
115
116 * rename "delimiter" argument to "delimiter" throughout code.
117 * needs to be implemented in both header and body for consistency.
118 * BUT, it would be nice to have a writeCSV() method as well:
119   * The API should probably expose simple methods like naToCSV() and ncToCSV() etc.
120
121=== Minor changes required ===
122
123The config file should include a global DEBUG = True|False. Then any print statements lying around and the message stuff in cdms_to_na.py can all be controlled by that.
124
125E.g.
126
127{{{
128if DEBUG == True: print "blah"
129}}}
130
131=== Re-factoring: detailed notes ===
132
133 * The following modules were removed and the information added to the main "nappy.ini" configuration file, accessed by nappy.utils.getConfigDict():
134   * localRules.py
135   * version.py
136   * cdmsMap.py
137
138 * localRules package removed and information put in config file.
139 * localRules/aircraft.py - moved out to contrib/aircraft/ - no longer supported!
140 * bin/scanFAAM.py - put in nappy/contrib/aircraft/
141 * general.py - moved to nappy/utils/common_utils.py
142 * textParser.py - moved to nappy/utils/text_parser.py
143 * naError.py - moved to nappy/na_error/na_error.py
144 * listManipulator.py - moved to nappy/utils/list_manipulator.py
145
146'''Re-factoring cdms2na.py'''
147
148 * compareAxes function - moved to nappy/cdms_utils/axis_utils.py#areAxesIdentical
149 * compareVariables function - moved to nappy/cdms_utils/axis_utils.py#areAxesIdentical
150 * arrayToList function - moved to nappy/utils/list_manipulator.py
151 * listOfListsCreator function - moved to nappy/utils/list_manipulator.py
152 * getMissingValue function - moved to nappy/cdms_utils/var_utils.py
153 * getBestName function - moved to nappy/cdms_utils/var_utils.py
154 * fixHeaderLength - old, destroyed!
155 * flatten2DTimeData - moved out to nappy/contrib/aircraft/
156 * modifyNADictCopy - needs a better name as it is specific to NA
157 * cdms2na function - turned into a class:
158   * getVariableCollections(f and varlist) --> (ordered_vars, other_vars)
159   * buildNADicts()
160   * writeToOutputFiles()
161 * CdmsToNABuilder class - renamed to NAContentCollector in na_content_collector.py module.
162 * Removed all stuff about ''rules'' as this was all unnecessary.
163
164The following have been changed from strings with "yes"|"no" values or boolean:
165 * _normalizedX
166 * time_warning
167
168=== Making a clean API ===
169
170We have added a top level API module called nappy_api.py which is imported automatically when you import nappy. This provides the "public" interface to the package. It might typically be used as follows:
171
172{{{
173import nappy  (via nappy_api.py module)
174nappy.convertNAToNC(na_file, nc_file)
175nappy.convertNCToNA(nc_file, na_file)
176}}}
177
178
179=== Broken? ===
180
181 * Have I broken textParser.py's main function - is it same in old and new - need a test for it!
182 * getBestName is Ag's made up rules for getting the best long name. Need to ask colleagues best suggested method.
183
184=== Questions ===
185
186 * Should we leave in the interactive time units checker in na_to_cdms.py? Seems a bit silly to have interactive code in middle of conversion script (potentially called by other process).
187
188=== Where is the new code? ===
189
190Might want to move to DCIP repository and update NDG page about that.
191
192Get new version from:
193
194{{{
195svn co svn+ssh://proj.badc.rl.ac.uk/svn/ndg/nappy/trunk
196}}}
197
198=== New structure ===
199
200   * nappy-<version>
201     * doc - containing any documentation including copy of Gaines and Hipskind specification
202     * data_files - containing example NA and NC files to test with
203     * bin - containing command-line python scripts
204     * test_outputs - ready to receive outputs from the unit tests
205     * nappy - main python package
206       * na_file/ - main NA file class stack
207       * nc_interface/ - conversion to/from NetCDf code
208       * cdms_utils/ - utility functions for CDMS (NetCDF-handling) code
209       * utils/ - general utility functions used in various parts of nappy
210       * unit_tests/ - a set of unit tests to test all major functionality
211       * na_error/ - exception stack for nappy
212       * contrib/ - contributed stuff (such as aircraft modules) - UNSUPPORTED!
213       * nappy_api.py - a clean API module for doing all the top-level stuff - this can be usefully and succinctly documented!
Note: See TracBrowser for help on using the repository browser.