Changeset 1889 for TI01-discovery


Ignore:
Timestamp:
18/12/06 19:16:44 (13 years ago)
Author:
selatham
Message:

getting auto ingest working

Location:
TI01-discovery/trunk/ingestAutomation/OAIBatch
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • TI01-discovery/trunk/ingestAutomation/OAIBatch/bodc_config.properties

    r1768 r1889  
    88# 
    99#Define groups - portal groups for limiting searches by 'group of datacentres'. 
    10 groups NERC-DDC NERC MDIP 
     10groups NERC-DDC MDIP 
    1111# 
    1212#Define which format is harvested from the data centre (one only) 
    13 format dif 
     13format DIF 
    1414# 
    1515#Define the data providers namespace 
  • TI01-discovery/trunk/ingestAutomation/OAIBatch/neodc_config.properties

    r1769 r1889  
    88# 
    99#Define groups - portal groups for limiting searches by 'group of datacentres'. 
    10 groups NERC NERC-DDC 
     10# 
    1111# 
    1212#Define which format is harvested from the data centre (one only) 
    13 format dif 
     13format DIF 
    1414# 
    1515#Define the data providers namespace 
  • TI01-discovery/trunk/ingestAutomation/OAIBatch/nocs_config.properties

    r1768 r1889  
    88# 
    99#Define groups - portal groups for limiting searches by 'group of datacentres'. 
    10 groups NERC 
     10# 
    1111# 
    1212#Define which format is harvested from the data centre (one only) 
    13 format dif 
     13format DIF 
    1414# 
    1515#Define the data providers namespace 
  • TI01-discovery/trunk/ingestAutomation/OAIBatch/oai_ingest.py

    r1880 r1889  
    11#!/usr/bin/env python 
    22""" Script oai_ingest.py takes parameter <datacentre>. 
    3 The /usr/local/WSClients/OAIBatch directory contains this python script, a DataProvider specific config file 
    4 and the oaiClean.py class which cleans up discovery records after harvesting. 
    5 The pre-processed files are then ingested to the eXist XML db. 
    6  
    7  Under this directory the following structure should be maintained: 
    8  
     3The /usr/local/WSClients/OAIBatch directory contains:- 
     4 - this python script, 
     5 - a DataProvider specific config file, 
     6 - the d2b.jar moles creator class which creates moles discovery records, 
     7 - the python module for extracting spatiotemporal information and adding to postgres db. 
     8Under this directory the following structure should be maintained: 
    99 ./data 
    1010 - /DATACENTRE/ 
    11                 - discovery/:         Records with namespace, schema declaration deleted - after having run 
    12                                       the oaiClean script. Ready to ingest in the discovery service. 
    13                 - oai/difYYYYMMDD/    Records as harvested from OAI 
    14  
     11                - discovery/:         Re-named documents ready to ingest in the discovery service. 
     12                - oai/difYYYYMMDD/    Documents as harvested from OAI 
    1513 Where  /DATACENTRE  varies for the different data providers 
    16  
    1714""" 
    1815#History: 
     
    4845date_string = commands.getoutput ("date +'%y%m%d_%H%M'") 
    4946os.putenv ('EXIST_HOME', '/usr/local/exist-client') 
    50 os.putenv ('PATH', ':/usr/java/jdk1.5.0_03/jre:/usr/java/jdk1.5.0_03:/usr/java/jdk1.5.0_03/lib/tools.jar:/usr/local/WSClients/OAIBatch:/usr/local/exist-client/bin:/bin:/usr/bin:.') 
    51 os.putenv ('CLASSPATH','.:/usr/java/j2sdk1.4.2_04/bin:/usr/local/WSClients/OAIBatch') 
     47os.putenv ('JAVA_HOME', '/usr/java/jdk1.5.0_03') 
     48os.putenv ('PATH', ':/usr/java/jdk1.5.0_03/bin:/usr/java/jdk1.5.0_03:/usr/java/jdk1.5.0_03/lib/tools.jar:/usr/local/WSClients/OAIBatch:/usr/local/exist-client/bin:/bin:/usr/bin:.') 
     49os.putenv ('CLASSPATH','.:/usr/java/jdk1.5.0_03/lib/tools.jar') 
    5250 
    5351# Get the harvested records directory and groups for this datacentre from the datacentre specific config file 
     
    140138    sys.exit("Failed at copying config file stage") 
    141139 
    142 #Change os directory to that with the other code in it. (need this?) 
     140#Change os directory to that with the code in it. 
    143141os.chdir('/usr/local/WSClients/OAIBatch') 
    144142 
     
    156154                #print "original file = %s, newfile = %s" %(original_filename, new_filename) 
    157155                commandline = "cp "+original_filename+ " " +new_filename 
    158                 print "Executing : " + commandline 
     156                #print "Executing : " + commandline 
    159157                status = os.system(commandline) 
    160158                if status !=0: 
     
    174172# Then run the minimum moles creator  which will run over all records in the supplied collection 
    175173# creates a directory ./DIF2MOLES to pass back records with original filename 
    176 commandline = "java -jar d2b.jar repositoryID " +datacentre_namespace+" repositoryLocalID "+datacentre+" format "+datacentre_format+" repository xmldb:exist://glue.badc.rl.ac.uk:8080/exist/xmlrpc userpw xxxxxx targetCollection /db/discovery/original/"+datacentre_format+"/"+datacentre_namespace 
    177 print commandline 
     174commandline = "java -jar D2B/d2b.jar repositoryID " +datacentre_namespace+" repositoryLocalID "+datacentre+" format "+datacentre_format+" repository xmldb:exist://glue.badc.rl.ac.uk:8080/exist/xmlrpc userpw xxxxxx targetCollection /db/discovery/original/"+datacentre_format+"/"+datacentre_namespace 
     175print "Executing command to run d2b.jar" 
    178176status= os.system(commandline) 
    179177if status!=0: 
     
    182180 
    183181# ingest the created discovery minimum molesrecords into eXist db. 
    184 commandline = "$EXIST_HOME/bin/client.sh -c ./DIF2MOLES -u admin -P xxxxxx -p ./DIF2MOLES" 
     182commandline = "$EXIST_HOME/bin/client.sh -c /db/discovery/moles -u admin -P xxxxxx -p ./DIF2MOLES" 
    185183print "Executing : actual command to ingest into exist db" 
    186184status = os.system(commandline) 
     
    229227    sys.exit("Failed at creating backup directory %s" %this_backupdir) 
    230228 
    231 commandline = "ls -1 ./DIF2MOLES | xargs -i cp ./DIF2MOLES{\} " + this_backupdir 
     229commandline = "ls -1 ./DIF2MOLES | xargs -i cp ./DIF2MOLES/{\} " + this_backupdir 
    232230print "Executing : " + commandline 
    233231status = os.system(commandline) 
  • TI01-discovery/trunk/ingestAutomation/OAIBatch/pml_config.properties

    r1768 r1889  
    1010# 
    1111#Define which format is harvested from the data centre (one only) 
    12 format dif 
     12format DIF 
    1313# 
    1414#Define the data providers namespace 
Note: See TracChangeset for help on using the changeset viewer.