ExArch: Climate analytics on distributed exascale data archives
Climate science demands on data management are growing rapidly as climate models grow in the precision with which they depict spatial structures and in the completeness with which they describe a vast range of physical processes.
For the Climate Model Inter-comparison Project 5 (CMIP5), a distributed archive is being constructed to provide access to what is expected to be in excess of 10 Peta-bytes of global climate change projections. The data will be held at 30 or more computing centres and data archives around the world, but for users it will appear as a single archive described by one catalogue. In addition, the usability of the data will be enhanced by a three-step validation process and the publication of Digital Object Identifiers (doi) for all the data. For many users the spatial resolution provided by the global climate models (around 150km) is inadequate: the CORDEX project will provide data scaled down to around 10km. Evaluation of climate impacts often revolves around extremes and complex impact factors, requiring high volumes of data to be stored. At the same time, uncertainty about the optimal configuration of the models imposes the requirement that each scenario be explored with multiple models.
This project will explore the challenges of developing a software management infrastructure which will scale to the multi-exabyte archives of climate data which are likely to be crucial to major policy decisions in by the end of the decade. Support for automated processing of the archived data and metadata will be essential. In the short term goal, strategies will be evaluated by applying them to the CORDEX project data.
|University of Toronto|