Version 1 (modified by mjuckes, 8 years ago) (diff)


Reproducible Batch Processing System

When processing large numbers of files it is important to ensure that the workflow is well characterised and that errors are not passed over. It is also important to be able to trace the provenance of input datasets and processing code, and have a clear means of updating data if necessary.

The Exarch batch processing system starts by ensuring that comprehensive provenance information is in the file metadata, or, if it becomes too bulky to be expressed cleanly as metadata, in a log file which is easily identified and accessible. Version information for both the input data and the processing software is included in the data file.