source: ndgCommon/trunk/ndg/common/src/tools/granulator-README.txt @ 5188

Subversion URL:
Revision 5188, 3.5 KB checked in by cbyrom, 11 years ago (diff)

Delete original granulator codebase and associated code and create new
granulator script in the ndgcommon tools package, where it more
logically sits. Add new README with up to date details and update

1Granulator Overview
4The granulator command line tool is used to attach data granules to data entities.  In order to do this, several pieces of
5information are required and these are defined in a config file (known, as the 'granulite file') that is provided as the
6sole input when running the script.  The granulator script converts this data into a granule 'atom' (based on the Atom web
7feeds standard) and stores it, along with any associated CDML or CSML files (referenced in the granulite file), in the
8chinook eXist DB.  The granule atom data is then included as a data granule in the cedarmoles postgres DB and associated
9with the data entity specified in the granulite file.
13Usage: python [OPTION] <granulite_filename>, <granulite_filename>,..."
14- where:\n  <granulite_filename>,.. is a list of granulite file with granule configuration data - NB, this is"
15   optional - if not specified, the script will process all files with a '.granulite' suffix in the "
16   current working directory."
18-x - delete mode - remove the granule data specified in the granulite from eXist"
19-a - aggregate coverage mode - if set, only coverage data that extends the existing"
20   atom coverage data will be added"
21-r - replace mode - if the granule data already exists in eXist, automatically"
22   overwrite it with the current data - NB, if not set, and existing data is found,"
23   an exception is thrown"
25The granulite file consists of a number of sections, delimited by titles with a '::' suffix.  Most of these sections
26can accept multiple values and several of the sections require data in triple format - specified using a '|' delimiter.
27An example granulite file is provided as a guideline.  Comments can be added to the granulite file by starting a line with '#'.
29If a CSML file is specified in the granulite, this is parsed and datasetID, title, coverage and parameter data is extracted.
30Alternatively, a CDML file can be specified; this is then used with the csmlscan script to produce a CSML file - and
31processing is then continued as with when a CSML file is specified.
33The script can be ran with a list of granulite files; if no files are specified, it
34will look for files in the current directory with the suffix, '.granulite' and run against
35them.  NB, the script is a wrapper to the module in ../lib - this can also
36be used to run the granulating process.
38Config details for the eXist database to be used by the script are stored in the exist.config files, respectively; these
39should be set to access permission 0600 and owned by the user running the script - to ensure maximum security -
40and valid user ID/password info should be added.
42Error Handling
45Currently the ingest of data into the eXist DB automatically overwrites old data - so if errors occur whilst at this
46point, the script stops operation and doesn't do any tidying up - in the expectation that when things are fixed,
47rerunning the script will bring everything up to date.
49NB, a granule can be specified to be associated with multiple data entities; the script keeps a tally of these associations
50and highlights any problems at the end of the script running, should these occur.
52Example Granulite Files
54Example input granulite files can be found in /ndg/common/unittests/testdata
56Running via the atom editor web app
58The granulite module is used by the atom editor in the MILK stack to provide the same
59functionality as this script.  From the atom home page, use the link,
60'Create data granule atom using granulite file' to access it.
Note: See TracBrowser for help on using the repository browser.