source: nappy/trunk/nappy/2008 refactoring notes.txt @ 3373

Subversion URL: http://proj.badc.rl.ac.uk/svn/ndg/nappy/trunk/nappy/2008 refactoring notes.txt@3373
Revision 3373, 10.2 KB checked in by astephen, 12 years ago (diff)
Line 
1= Re-factoring !NAppy - NASA Ames Processing in Python - 20080210 =
2
3
4== This page ==
5
6This page attempts to document to provide a semi-detailed view of the re-factoring process but is by no means complete.
7
8== Overview of re-factoring ==
9
10This page presents a description of changes made to the !NAppy (hereafter ''nappy'') python package in the re-factoring process undertaken in February 2008 to improve NAppy as follows:
11
12 * Re-factor entire code-base to remove any over-sized modules/classes/functions.
13 * Re-structure code into more sensibly named and organised modules and packages.
14 * Move all the code used to interact with NetcDF files (via the external CDMS python package) into its own sub-package (rather than including as parent classes of main NASA Ames file classes).
15 * Remove all the ad-hoc code written to support FAAM aircraft data (which we do not believe is being used by anyone) and place in a ''contrib/'' package that is not supported.
16 * Re-name all variables within code to use common style convention.
17 * Push all non-python components into external directories.
18 * Create a ''nappy_api.py'' module (pronounced ''nappy appy'') that contains the simplest API that most users will want.
19 * Re-factor the two command-line scripts to allow more arguments to be passed and better encapsulation.
20 * Create unit tests for all the main read, write and convert functions that fully test the outputs are correct etc.
21 * Create enough error checking during file reading to allow NAppy to be used as a format-conformance checker.
22 * Allow writing of CSV files - i.e. replacing the existing (space or tab) spacer with commas.
23 * Allow writing of an ''annotated'' format which includes an additional column on the left-hand edge of the file that explains, in human-readable terms, what that line contains.
24 * Development of a nappy ''egg'' to allow easy installation (would be dependent on cdat_lite (another egg)).
25
26'''NOTE: We do NOT intend to make this version of nappy backwards-compatible with any previous version.'''
27
28
29=== More specific requirements from DCIP Project ===
30
31Here are the key requirements from the DCIP project:
32
33 * Put in the option to order variables explicitly, rather than relying on nasa_ames_var_number attributes.
34 * Support for NetCDF to FFI 2110 – NX for the quickest changing IV needs to be converted to an auxiliary variable which does not fit nicely into the structure. In principle the second independent variable length changes at each first independent variable value but we may need it fixed.
35 * Support for conversion to/from NetCDF for 2130 and 2160? (not for DCIP).
36 * Addition of annotated first column.
37 * Proper use of FFI selection - is this needed really?
38 * In cdms2na.py, stopping filling an axis once the start and increment are known may not be enough. May need to populate all of the axis.
39 * Column headings for Excel users, this will clash with the current #End of normal comments# lines.
40 * Is it still right that the 4010 class is the same as 2010?
41 * Does comma separation option work correctly – any consequences for RDATE and DATE which retains some space separation?
42 * float rounding in nappy to get difference between independent variable values - and degree of accuracy (8.3f etc)
43 * Need sensible handling of rotated grid data where found (might be wrapper outside of nappy
44
45=== Other (perceived?) user requirements ===
46
47Other users have also asked questions such as:
48
49 * Users also want: getVariableArray(var_name) to grab a specific variable. Would need a cdms-like variable class. User has been asked to give more details.
50
51=== Unit tests ===
52
53To make sure this software actually works we need a unit test suite. This means writing a test class for all the major pieces of functionality exposed by nappy.
54
55
56'''Current state of Unit tests'''
57
58Successful:
59
60 * test_na_file_1001.py
61 * test_na_file_1010.py
62 * test_na_file_2010.py
63 * test_na_file_2110.py
64 * test_na_file_2310.py
65
66Failed:
67
68 * test_na_file_1020.py
69 * test_na_file_2160.py
70 * test_na_file_3010.py
71 * test_na_file_4010.py
72
73Did the code work before in old nappy?
74
75Old nappy is available at:
76
77http://proj.badc.rl.ac.uk/ndg/browser/nappy/tags/nappy_pre_refactor_feb2008/nappy
78
79Get it with:
80
81{{{
82svn co svn+ssh://proj.badc.rl.ac.uk/svn/ndg/nappy/tags/nappy_pre_refactor_feb2008/nappy
83}}}
84
85Extending the unit tests:
86
87 * any doing file read/writes should do a diff of the input and output file and compare them:
88   * note that some differences might be only cosmetic and the content is essentially the same
89
90 * we need a unit test for all conversions to CDMS objects, NetCDF, and CSV.
91
92
93=== Adding annotation column ===
94
95Suggest that we add the annotation column in the following way:
96
97 * new argument added to NA file classes:
98   * annotated=True|False
99 * only provide annotation column on output to avoid any confusion with trying to read in first column where delimiter is not comma. Comma is easy to do but space or tab would be almost impossible to implement sensibly.
100 * if output only then only need to re-factor write methods (header and data).
101 * NOTE: could do read as well by reading in file and then removing column 1 (as long as we know the delimiter is a comma). But don't do this now.
102
103We need a defintion of what each row means in a simple configuration file that maps the first item to :
104
105{{{
106[common_header]
107DX = Interval between coordinate variable values (for coordinate variables 1, 2,...n)
108}}}
109
110Need to define names in config file and a ways of mapping to each of the lines by tagging to self.A, self.X self.XNAMES etc.
111
112=== Adding output to CSV ===
113
114We need to allow file-writing to CSV format, as follows:
115
116 * rename "spacer" argument to "delimiter" throughout code.
117 * needs to be implemented in both header and body for consistency.
118 * BUT, it would be nice to have a writeCSV() method as well:
119   * The API should probably expose simple methods like naToCSV() and ncToCSV() etc.
120
121=== Minor changes required ===
122
123The config file should include a global DEBUG = True|False. Then any print statements lying around and the message stuff in cdms_to_na.py can all be controlled by that.
124
125E.g.
126
127{{{
128if DEBUG == True: print "blah"
129}}}
130
131=== Re-factoring: detailed notes ===
132
133 * The following modules were removed and the information added to the main "nappy.ini" configuration file, accessed by nappy.utils.getConfigDict():
134   * localRules.py
135   * version.py
136
137 * localRules package removed and information put in config file.
138 * localRules/aircraft.py - moved out to contrib/aircraft/ - no longer supported!
139
140
141=== Making a clean API ===
142
143We have added a top level API module called nappy_api.py which is imported automatically when you import nappy. This provides the "public" interface to the package. It might typically be used as follows:
144
145{{{
146import nappy  (via nappy_api.py module)
147nappy.convertNAToNC(na_file, nc_file)
148nappy.convertNCToNA(nc_file, na_file)
149}}}
150
151
152=== Broken? ===
153
154Have I broken textParser.py's main function - is it same in old and new - need a test for it!
155
156=== Questions ===
157
158 * Should we leave in the interactive time units checker in na_to_cdms.py? Seems a bit silly to have interactive code in middle of conversion script (potentially called by other process).
159
160=== Where is the new code? ===
161
162Need to move to DCIP repository and update NDG page about that.
163
164----
165
166Get new version from:
167
168svn co svn+ssh://proj.badc.rl.ac.uk/svn/ndg/nappy/trunk
169
170=== New structure ===
171
172   * nappy-0.2.3
173     * nappy
174     * bin
175     * nappy/nc_interface
176     * nappy/cdms_utils/
177     * nappy/utils
178     * nappy/na_file
179     * nappy/contrib/aircraft
180   
181
182======
183bin/scanFAAM.py - put in contrib
184
185======
186
187version.py - put in config file.
188
189======
190
191general.py --> call it utils/xxxxx.py
192
193textParser --> utils/text_parser.py
194
195naError.py --> na_error/na_error.py
196
197naCore.py --> na_file/na_core.py
198
199listManipulator -_> utils/list_manipulator.py
200
201cdmsMap.py --> put in config file given simplicity
202
203Need utils/parse_config.py
204
205======
206
207CDMS stuff is most of the mess
208==============================
209
210
211Renamed some of:
212
2131. naToCdms.py holds:
214
215AbstractNAToCdms CLASS
216toCdmsFile
217createCdmsVariables
218toCdmsVariable
219createCdmsAuxVariables
220auxToCdmsVariable
221createCdmsAxes
222toCdmsAxes
223
2242. na2cdms.py:
225
226Command-line script
227
2283. bin/na2nc:
229
230Same as na2cdms.py ???
231
2324. cdms2na.py is the mother of all modules:
233
234compareAxes --> areAxesIdentical(a,b) cdms_utils
235compareVariables --> areDomainsIdentical(v1, v2) cdms_utils
236isAuxAndVar --> isAuxVarAndVar
237arrayToList utils
238listOfListsCreator utils
239getBestName cdms_utils - need some advice and compare with Dom
240getMissingValue cdms_utils
241fixHeaderLengthNowDefunct # Can destroy
242flatten2DTimeData aircraft
243modifyNADictCopy - needs a better name as it is specific
244cdms2na - 200 lines of code to do main conversion, needs to be split out into other stuff.
245 * getVariableCollections(f and varlist) --> (ordered_vars, other_vars)
246 * buildNADicts()
247 * writeToOutputFiles()
248
249class CdmsToNABuilder --> NAContentCollector: (naDict, varIDs, varBin)
250__init__ --> sets everything up and runs it move some to --> analyse()
251analyseVariables
252defineNAVars
253defineNAAuxVars
254getAxisDefinition
255defineNAGlobals
256defineNAComments
257defineGeneralHeader
258_useLocalRule --> Remove this and put it all in aircraft contrib bit
259
260=================
261
262'''naToCdms.py'''
263
264The naToCdms.py module has been re-factored by
265naToCdms.py
266===========
267
268This is a sub-class of all NAFile objects. Bad idea. What we need is to:
269
270import convertor
271convertor.writeToNC(blah)
272convertor.convertToCdms(blah): (vars, global_atts)
273
274class NAToCdms
275
276toCdmsFile
277createCdmsVariables - does all
278toCdmsVariable - does each in turn
279CreateCdmsAuxVariables - does all
280auxToCdmsVariables - does each in turn
281createCdmsAxes - does all
282toCdmsAxes - does each in turn
283
284================
285
286GREP
287====
288
289Need to do a lot of grepping for inconsistencies.
290
291The following need to be set to True or False (not yes,no):
292 * _normalizedX
293 * time_warning
294
295Global find and replace:
296
297Should we leave in the interactive time units checker in na_to_cdms.py - ask Charles
298
299=====
300cdms_map is not all done in the config file dict.
Note: See TracBrowser for help on using the repository browser.