source: TI01-discovery/trunk/ingestAutomation/OAIBatch/README @ 251

Subversion URL: http://proj.badc.rl.ac.uk/svn/ndg/TI01-discovery/trunk/ingestAutomation/OAIBatch/README@2378
Revision 251, 1.7 KB checked in by mguiterr, 15 years ago (diff)

* empty log message *

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
Line 
1This directory  contains a python script (and some java ) to handle difs after harvesting.
2 
3Under this directory the following structure should be maintained:
4
5
6./data
7        - /DATACENTRE/
8·               - discovery/:           Records with namespace, schema declaration deleted. After having run the script. Ready to ingest in the discovery service.
9·               - oai/difYYYYMMDD/      Records as harvested from OAI.
10
11Where  /DATACENTRE  varies to the different data providers
12
13- cd OAIBatch/data/DATACENTRE/oai
14- mkdir difYYYMMDD  (This directory will only maintain a copy of the difs in case the script  rewrites something wrong)
15
16HARVEST_HOME = The harvested_records directory
17
18cp  HARVEST_HOME/*.xml OAIBatch/data/datacentre/oai/difYYYYMMDD
19cp  HARVEST_HOME/*.xml OAIBatch/data/datacentre/discovery
20
21The file config.properties contains the name=value pair to parse the filename. Define under the property name oai_host  the string to eliminate from the filename.
22
23cat config.properties
24
25        #### config.properties #######
26# Define host_OAI as the string that OAI adds to the filenames after harvesting
27# String added by OAI for BODC, SOC, NCAR
28# BODC = oai%3Agrid.bodc.nerc.ac.uk%3A
29# SOC = oai%3Aoai.noc.soton.ac.uk%3A
30# NCAR = oai%3Aucar.ncar.scd.cdp%3A
31
32host_OAI=oai%3Agrid.bodc.nerc.ac.uk%3A
33
34                ###########
35
36
37Execute the script
38python oaiProc.py ./data/DATACENTRE/discovery/*
39
40The script reads the files from OAIBatch/data/datacentre/discovery and outputs within the same directory the files.
41The result will get rid of the "oai%3Aucar.ncar.scd.cdp%3A" that oai adds to
42the filenames and it will leave <DIF> as the root element.
43
44Once the script has finished:
45       
46- cd /usr/local/WSClients/OAIBatch/data/datacentre/discovery
47- rm oai*       
48
49
50
51
52
Note: See TracBrowser for help on using the repository browser.