1 | This directory contains a python script (and some java ) to handle difs after harvesting. |
---|
2 | |
---|
3 | Under this directory the following structure should be maintained: |
---|
4 | |
---|
5 | |
---|
6 | ./data |
---|
7 | - /DATACENTRE/ |
---|
8 | · - discovery/: Records with namespace, schema declaration deleted. After having run the script. Ready to ingest in the discovery service. |
---|
9 | · - oai/difYYYYMMDD/ Records as harvested from OAI. |
---|
10 | |
---|
11 | Where /DATACENTRE varies to the different data providers |
---|
12 | |
---|
13 | - cd OAIBatch/data/DATACENTRE/oai |
---|
14 | - mkdir difYYYMMDD (This directory will only maintain a copy of the difs in case the script rewrites something wrong) |
---|
15 | |
---|
16 | HARVEST_HOME = The harvested_records directory |
---|
17 | |
---|
18 | cp HARVEST_HOME/*.xml OAIBatch/data/datacentre/oai/difYYYYMMDD |
---|
19 | cp HARVEST_HOME/*.xml OAIBatch/data/datacentre/discovery |
---|
20 | |
---|
21 | The file config.properties contains the name=value pair to parse the filename. Define under the property name oai_host the string to eliminate from the filename. |
---|
22 | |
---|
23 | cat config.properties |
---|
24 | |
---|
25 | #### config.properties ####### |
---|
26 | # Define host_OAI as the string that OAI adds to the filenames after harvesting |
---|
27 | # String added by OAI for BODC, SOC, NCAR |
---|
28 | # BODC = oai%3Agrid.bodc.nerc.ac.uk%3A |
---|
29 | # SOC = oai%3Aoai.noc.soton.ac.uk%3A |
---|
30 | # NCAR = oai%3Aucar.ncar.scd.cdp%3A |
---|
31 | |
---|
32 | host_OAI=oai%3Agrid.bodc.nerc.ac.uk%3A |
---|
33 | |
---|
34 | ########### |
---|
35 | |
---|
36 | |
---|
37 | Execute the script |
---|
38 | python oaiProc.py ./data/DATACENTRE/discovery/* |
---|
39 | |
---|
40 | The script reads the files from OAIBatch/data/datacentre/discovery and outputs within the same directory the files. |
---|
41 | The result will get rid of the "oai%3Aucar.ncar.scd.cdp%3A" that oai adds to |
---|
42 | the filenames and it will leave <DIF> as the root element. |
---|
43 | |
---|
44 | Once the script has finished: |
---|
45 | |
---|
46 | - cd /usr/local/WSClients/OAIBatch/data/datacentre/discovery |
---|
47 | - rm oai* |
---|
48 | |
---|
49 | |
---|
50 | |
---|
51 | |
---|
52 | |
---|