Climate Diagnostics Benchmark (CDB) version 0.2

 Frédéric Laliberté and  Paul Kushner, 2011-2012, University of Toronto, Ontario, Canada.

From version 0.2 on, the CDB will be split into three different tools blocks:

  • cdb_query: Code for querying a local and remote archive to obtain the optimal set of models for a diagnostic. Coded in Python and available on PYPI.
  • cdb_driver: Wrapper for cdb_query that handles simple distributed processing. Coded in Python and to be available on PYPI.
  • cdb_diags: A set of diagnostics that formatted to be easily handled by cdb_query and cdb_driver. To be available on this website.

Frédéric Laliberté, November 14 2012

Climate Diagnostics Benchmark (CDB) version 0.1

Prototype in support of ExArch Work Package 3: "Climate Diagnostics"

Draft Documentation

 Frédéric Laliberté and  Paul Kushner, 2011-2012, University of Toronto, Ontario, Canada.

The CDB was tested on the CICLAD compute server at the IPSL and on the login nodes at BADC.

Overview of scope and structure.

The CDB provides a simple framework to write ESG-oriented climate diagnostics scripts. It aims to:

  • Simplify the development of climate diagnostics on ESG servers
  • Benchmark a server-side computing framework
  • Help in the timely delivery of model intercomparisons

The CDB facilitates the processing of large datasets with the underlying principle that "a dataset is useful only as long as it is used". Doing science with datasets in peta- and exabytes archives requires new tools and the prototype CDB provided here is intended as one possible solution to this challenge.

The first release of the CDB requires climate researchers to have an ssh access to an ESG archive node. In its simplest formulation, the CDB can be used remotely to create a subset of the ESG archive. This subset can then be transferred to a local compute node for further treatment. Because the CDB is portable, it can then be used locally for further processing of the ESG archive subset.

With its more advanced features, the CDB can be used remotely from the beginning to the end. For example, it can generate PBS submit scripts to simplify the distributed processing of ESG data on a cluster with a direct connection to an ESG node, as on the CICLAD server at IPSL.

It is a portable framework that makes it easier for climate scientists to use the vast amount of data on an ESG node. We present a simplified CDB flowchart at the end of this page.


  1. Installation instructions
  2. Tutorial
  3. Features
    1. Finding available models
    2. Fixing an incomplete dataset
    3. Using an output in another computation
  4. Quality Assurance issues
  5. Notes on performance and benchmarking
  6. Features not yet implemented

This wiki is being maintained by  Frédéric Laliberté.