source: TI03-DataExtractor/branches/old_stuff/dx-webservice/PLANS.txt @ 793

Subversion URL: http://proj.badc.rl.ac.uk/svn/ndg/TI03-DataExtractor/branches/old_stuff/dx-webservice/PLANS.txt@793
Revision 793, 16.6 KB checked in by astephen, 13 years ago (diff)

Put all the old code in the old_stuff branch.

  • Property svn:executable set to *
  • Property svn:mime-type set to application/octet-stream
Line 
1DEVELOPMENT PLANS FOR THE DX
2
3* dx should be able to take a complete query from B-metadata selection (such as search term, geo and temporal constraints) and a result set (of URIs). It should be flexible enough to work out what it doesn’t have and get the user to provide that before delivering.
4* allow the dx to take more than 2 datasets.
5* allow more than one variable to be selected.
6* print a summary on a ConfirmationPage before the results.
7* a progress meter would be useful.
8* create a status page that shows who is running a job and how long it has run for.
9* have an option for users to cancel their own jobs – from the status page.
10* geosplat should know when two datasets have been differenced (maybe in a variable attribute or just a varname ‘xxx_minus_yyy’) so it can say this is a differenced variable.
11* plots to say “Dataset: ERA-40 Pressure level data” rather than “SOURCE: BADC…”. Move this source info to the bottom in a smaller font.
12* look into the file lock error? “Downloading 10yr block of era40 forecast data for my 80N-20N 80W-80E caused the system to crash with a 'filelock error'.”
13* Get 1957 ERA-40 data into the metadata.
14* look into this: “mean sea level pressure fields from ECMWF 40 years reanalysis dataset. Northern hemisphere between 40 and 90 degrees latitude. One-day file weights 7 kb. From BADC I tried to download the same (daily means for sea level pressure) data for the same area (-180E; +180W; 90N 40N). The length of the dataset was roughly 1800 days (a bit shorter). Then, the approximate size of extracted file should have been about 1800 x 7kb = 12,600 kb, or 12.6 Mb. Instead, the resulted file was something like 260 Mb or so.
15* Last date in HadCM3 data is not the monthly mean on 16th, you can select all days. Need to fix this.
16* sort the lnsp and z data in the ecmwf archives by including a new level type of length 1.
17* HiGEM land sea mask has been taken out, work out how to put back in.
18* "No time axis defined in the time section" - i.e. if start_time and end_time are None then just don’t show the time options.
19* Note that if you compare 2.5 deg and 1.0 deg ecmwf op for 20020101 then you don’t get zeros. because ecmwf routines did the n80->1.0 whereas cdat does the 2.5 to 1.0 interp! Can I find a useful way to document it?
20* geosplat should know the length of the variable name and move the title across a bit if required on the plot.
21* move the las/dx web pages to /help/software/dx /help/software/las so they are only there.
22* talk to ASH about linking My BADC to the dx.
23* check for security loopholes – make sure it is at the file level for all parts of the code.
24* replace SimpleCookie with Cookie.
25* Set up logs to record information:
26* /stats/by_variable.log <datetime>|<vars>|<dsg>|<ds>|<req_id>
27* /stats/by_user.log       <datetime>|<user>|<dsg>|<ds>|<vars>|<req_id>
28* /stats/by_datasetGroup.log
29* set up a timer along the lines of:
30tlog=open(os.path.join(basedir, ‘logs’, ‘timing.txt’), ‘a’))
31tlog.write(“%s %s %s” % (estimatedDuration, actualDuration, request[“dataset_1”])
32
33and
34
35def analyseTimingsLog(logfile=os.path.join(basedir, ‘logs’, ‘timing.txt’)):
36   loglines=open(logfile).readlines()
37   dsets={}
38   for line in loglines:
39       (estimate, actual, dset)=line.split()
40       (estimateDuration, actualDuration)=(float(estimate), float(actual))
41        if not dsets.has_key(dset):
42           dsets[dset]=[[], []]       
43              dsets[dset][0].append(estimatedDuration)
44           dsets[dset][1].append(actualDuration)
45
46    # Calculate the averages and the differences between them
47    print “””Comparisons of average estimated and actual timings follow with a recommended scaling factor by which you should multiply the timing estimates in config.py”””
48    print “\nDataset\tEstimate\tActual\tScaling factor”
49    print “===============================================”
50    for dset in dsets.keys():
51        print dset+”\t”,
52        averages=[]
53        for duration in range(2):
54            total=0
55            for i in dsets[dset][i]:
56                total=total+i
57            averages.append(total/len(dsets[dset][i])
58            print averages[duration],
59        scale_factor=actual/estimated
60        print scale_factor
61* introduce long version of templates in case they don’t exist. And rename them to something more useful and generic.
62* think of a smart solution to showing the start and end date ranges (and interval).
63* check variable names are coming out the same for ERA-40 and ERA-15 etc.
64* get ECMWF Op under dx.
65* use cdms.createGlobalMeanGrid() → to allow global meaning.
66* get COAPEC PP data under dx.
67* get UKMO NWP pp data under dx.
68* get COAPEC virtual variables under dx.
69* Flag virtual variables somewhere in the dx. – link these to the help-page with an explanation of ‘VD’ and something like ‘it might take longer to respond.’
70* Javascript to make sure end of month is a valid day!
71* do SCRIP interpolations if possible
72* what happens if you try and interpolate to a point? My interp.py (bi-linear) module might be needed.
73* sort out the HadCM3 issue of CDAT not keeping the proleptic calendar metadata.
74* see if we can improve performance on CDMLs not using the file template by using one (and maybe aliasing to a set of new filenames in order to get sensible date/time/var in the filename) – Possible hiccup – no variable name in filename, how does it respond? CDAT might be sensible (this time ;-).
75* When you step back a few stages the dx should delete the information from future stages to avoid impossible combinations (resulting in key errors).
76* create validation class to check that user selections are doable (i.e. check if end time is after or equal to start time and then step the user back if necessary with an appropriately located information message on the page – might be able to do all of this with Javascript (which, as Marta said, keeps it quick and client-side) but need multi-browser testing.
77* analyse data in geosplat for feature instances
78* allow 2 variables to go into geosplat so that scatterplot and vector plot capabilities can be exploited.
79* design command-line and web services versions.
80* multiple output files, needs a file-name template to build new filenames – keep a simple one to one mapping for timestep:file at start. Need to also set a limit on the size of individual files.
81* Need to wrap also provide gzip and tar functionality for multiple files – maybe this can wait as the Data Browser already does this.
82* Will zonal mean operation take all zones and mean them  or does it need only one zone as input? Can be another operation.
83* share code with J-Marie and work out what he can work on.
84* just include both datasets (rather than difference).- this will be provided by the OperationsPage. OperationsPage can include ‘output_format’, ‘operation’ and ‘multiple_output_file_switch’ (including output_filename_prefix, output_filename_template, output_file_name???)
85* change the CDML files to include CF metadata global attributes.
86* UserInterface classes required are:
87** CGIInterface
88** WebServiceInterface
89** CommandLineInterface (local – no cookies)
90** NetClientInterface (remote – uses cookie validation)
91    with methods like:
92*** presentDomainOptions()
93*** presentVariablesOptions()
94* DXDGML (Data extractor Dataset Grouping Mark-up Language):
95<dxdgml>
96<datasetGroup>
97<id>ERA-40</>
98<longName>ECMWF ERA-40 Re-analysis Project</>
99<allowedGroups>era ear4t ecmwfera</>
100<allowedUsers>astephen spepler</>
101<filenamePrefix>ec-e40</>
102<dxDatasets>
103<dxDataset>
104<localID>e40-1.0-am</>
105<dataType>Forecast|Analysis|Measurement</>
106<levelType>Model levels</>
107<horizontalResolution>”1.0 degree x 1.0 degree”</>
108<filenameAddition>analmodellevs</>
109<virtualFlag>virtual</>
110</dxDataset>
111* Web service version thoughts:
112** client needs following methods (can optionHandler providethem all?):
113*** getDatasetGroupOptions()
114*** getDatasetOptions()
115*** getVariablesOptions()
116*** getVertical/HorizontalDomainOptions()
117*** getOperationOptions()
118*** etc
119*** setVariables(var)
120*** setDatasetGroups(dsg)
121*** setDatasets(det)
122*** setVertical?HorizontalDomain()
123*** setTemporal()
124*** setOperation()
125*** setMultipleOutputFiles(on|off)
126*** viewCurrentRequest()
127*** processRequest()
128*** etc etc
129
130* geosplat should allow multiple plots per page
131* understand how to plot trajectories etc in vcs, how easily could we put that functionality (currently in IDL) under geosplat.
132
133finally, som nappy  slogans:    “re-usable software, not disposable” or
134   “washable software for your baby”.
135
136
137
Note: See TracBrowser for help on using the repository browser.