Changeset 4957 for TI01-discovery


Ignore:
Timestamp:
11/02/09 15:26:50 (11 years ago)
Author:
sdonegan
Message:

Add details for enhanced operations.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • TI01-discovery/tags/stable-TI01-ingestAutomation_Proglue_upgradesAndReporting/temp/OAIBatch/README.txt

    r3998 r4957  
    2222        -v - 'verbose' mode - prints out logging of level INFO and above 
    2323        -d - 'debug' mode - prints out all logging 
     24        -i - 'individual file ingestion mode' 
    2425         
    2526NB, the default level is WARNING and above or, if ran via the run_all_ingest script, INFO and above. 
     
    6364NB, any updates to the original doc will cause a trigger to record the original row data in the original_document_history 
    6465table - to allow basic audit tracking of changes. 
     66 
     67 
     68Ingesting a single file 
     69------------------------- 
     70The oai_document_ingester.py script can now ingest individual documents rather than all documents in the direcory listed in the config file.  Just use the "-i" option and give the full/relative path plus filename after the datacentre name (as script still uses the datacentre config details for other info req'd at ingest) 
     71 
     72 
     73Ingestion reporting 
     74-------------------- 
     75The ingester records both successful and unsuccessful ingestion attempts to a named <datacentre>_summary.txt file in the data directory.  This records problem files and is invaluable for summarising the overall ingestion on a per data centre basis. 
     76When using the "run_all_ingest.py" script for batch ingestion from all datacentres a report will be generated from the individual data centre ingestion attempts.  This report containing details of the number of successful and unsuccessful ingestions plus time and lists of failed ingestion files will be emailed to the hardcoded email address within the script. 
     77 
     78 
     79Deleting documents from the database 
     80------------------------------------ 
     81Provision has now been made to delete individual records from the postgres database (all tables).  Use DeleteRecord.py and specify a single arguement comprising either the "oai" filename on the system, local filename (the value in the "original_document_filename" or the value in the "discovery_id" column within the database.  The script will then delete this record from the database.  This script requires the updated ingest_procedures.sql script.  
    6582 
    6683 
     
    108125Utilities.py 
    109126 
    110 2. Whilst testing the scripts, it was noted that the various MDIP transforms do not currently work 
     1272. DONE: Whilst testing the scripts, it was noted that the various MDIP transforms do not currently work 
    111128for the majority of the badc datasets; as such, MDIP format has been 
    112129commented out of the PostgresRecord class; this will need to be fixed asap! 
    113130 
    114 3. The system should handle the deletion of files from the DB.  Not sure how this is handled by the harvester - i.e. are all files 
     1313. DONE: The system should handle the deletion of files from the DB.  Not sure how this is handled by the harvester - i.e. are all files 
    115132harvest always - so we need to do a check in the DB so that the contents directly match - and then delete and extras - or is there 
    116133another way of determining deletions?  Once this is established, use the PostgresDAO.deleteOriginalDocument() method to do the clearing 
     
    147164  parameters = self.dgMeta.dgMetadataRecord.dgDataEntity.dgDataSummary.dgParameterSummary.dgStdParameterMeasured.dgValidTerm 
    148165 
    149 9. dif2moles xquery transform seems to lose the end_date info in the Temporal_Coverage elements     
     1669.DONE? dif2moles xquery transform seems to lose the end_date info in the Temporal_Coverage elements     
    150167 
    15116810. A significant gain to the system could be implemented potentially without much effort: add a contact email address 
Note: See TracChangeset for help on using the changeset viewer.