Recent posts (max 20) - Browse or Archive for more

Dissemination Events in 2013

The JISC MRD PIMMS project comes to an end at the end of March in 2 Weeks time! But that is not the end of PIMMS activities, here is a list of the upcoming dissemination events planned for 2013.

  • NCAS Chemistry Climate Interactions Workshop March 21-22 Cambridge
  • JISC Managing Research Data Workshop March 25-26 Aston
  • EGU April 7-12 Vienna
  • ERMITAGE June 26-27 Brussels
  • NCAS Staff Meeting July 16-17 Birmingham
  • NCAS Climate Modelling Summer School September 8-20 Oxford

PIMMS at EGU

The PIMMS project has had our  abstract accepted by the  Metadata, Data Models and Semantics session at the EGU (European Geophysical Union) general assembly in April.

We have an  oral presentation on Tuesday 9th April at 14:45 in room R14. Come check out our session and also find out about  Making Breakfast with 5 oz of Cinnamon Porridge and 150 gr of Sweet Oatmeal. I kid Ye not!

PIMMS workshop: Web Engineering for Research Management

Many thanks to Emma Tonkin for her insightful blog post about the PIMMS workshop in Bristol. She has really captured something of the atmosphere of the day as well as the techie stuff. Go check it out: Emma's Blog

PIMMS Dissemination Workshops

In February and March the PIMMS team ran two dissemination workshops in Bristol and in Reading.

The Bristol participants were mainly paeleo climate modellers, they generally all use the same model but with different input characteristics. In reading we had a mixture of climate process modellers and IAM modellers from the ERMITAGE project.

In both of these workshops we began by giving an overview of PIMMS followed by a walk through of how to use PIMMS to generate metadata. Lunch is a time to get to know the participants which is useful in tems of guiding the afternoon activities which centre around specialising PIMMS to suit the particular requirements of the participants. For instance, the Bristol paeleo climate modellers will be using the standard climate modelling controlled vocabulary, that PIMMS inherited feom the METAFPR initiatives for CMIP5, and specialising with unique experiment information. In Reading the model of useage is somewhat different with users specialising experiments and extending the controlled vocabulary, as is the case for the CASCADE model. ERMITAGE will have a completely new controlled vocabulary that can be used for the economic modelling they do. In Escence the nature of the ERMITAGE experiments is less about understanding the science through models and more aout how to get different models to couple together.

Bristol paeleo climate modellers are interested in how they can use the PIMMS metadata manager to describe their experiments in a consistent way. Also, an important requirement of the Bristol team is that access to the CIM documents produced by PIMMS must be able to be restricted. Unfortunately the development of tools for the consumption of CIM content is out of the cope of the PIMMS project but we have pressed these requirements on to the ESDoc developers to ensure that the CIM portal will be able to restrict access to metadata.

We have been working closely with Reading reaseachers to develop new controlled vocabulary for convection and turbulence to configure the PIMMS metadata manager to suit the documentation requirements of the CASCADE model. The Reading workshop was made up partly of a contingent from the ERMITAGE project who plan to use the PIMMS metadata manger to create metadata documentation for their experiments coupling together components of an integrated assessment model.

PIMMS is comprehensive and as such is able to accomodate documentation requirements of each of these communities.

  • In Bristol we focused on creating experiment documents for paeleoclimate experiments
  • In Reading we've done extensive work on how to do extend controlled vocabularies
  • For ERMITAGE we've focussed on how we can use PIMMS to describe the coupling of model components.

But why go to all this effort to document what we do in standardised formats?
Well asside from the issue of it being easier to compare the results of different models if we use a standardised system there is a whole world of the graphical representation of data out there that becomes possible to hook up into when we make use of known metadata standards. The ES-Doc team plan to make use of D3.js technologies to visualise CIM content  http://d3js.org/ . People documenting their work with PIMMS will be able to piggy back on this work to visualise their own metadata. Below are some examples of the kind of visualisation options that are possible.

Announcing Two PIMMS Workshops

Demonstration of the PIMMS metadata infrastructure for simulation documentation
 http://pimms.ceda.ac.uk

PIMMS will be holding two dissemination workshops:

  • February 14th at the University of Bristol
  • February 19th at the University of Reading

The workshops will be semi all day affairs from 11am until 3pm to allow participants time to travel.
To register a place on one of these workshops and to book yourself lunch please email charlotte.pascoe at stfc.ac.uk.

More About PIMMS
PIMMS (Portable Infrastructure for the Metafor Metadata System) provides a method for consistent and comprehensive documentation of modelling activities that enables the sharing of simulation data and model configuration information. The aim of PIMMS is to package the metadata infrastructure developed by Metafor for CMIP5 so that it can be used by climate modelling groups in UK Universities.

PIMMS tools capture information about simulations from the design of experiments to the implementation of experiments via simulations that run models. PIMMS uses the  Metafor methodology which consists of a Common Information Model (CIM), Controlled Vocabularies (CV) and software tools. PIMMS enables both the creation and consumption of CIM content via a web services infrastructure and portal developed by the  ES-DOC community.

There are three paradigms of PIMMS metadata collection:

1 Model Intercomparision Projects (MIPs) where a standard set of questions is asked of all models which perform standard sets of experiments.
2 Disciplinary level metadata collection where a standard set of questions is asked of all models but experiments are specified by users.
3 Bespoke metadata creation where the users define questions about both models and experiments.

Examples will be shown of how PIMMS has been configured to suit each of these three paradigms. In each case PIMMS allows users to provide additional metadata beyond that which is asked for in an initial deployment.

The primary target for PIMMS is the UK climate modelling community where it is common practice to reuse model configurations from other researchers. Usually a model configuration is provided by a researcher in the same research group or by a previous collaborator with whom there is an existing scientific relationship. However, the consistent and comprehensive documentation enabled by PIMMS will facilitate the wider sharing of climate model data and configuration information.

The PIMMS methodology assumes an initial effort to document standard model configurations. Once these descriptions have been created users need only describe the specific way in which their model configuration is different from the standard. Thus the documentation burden on the user is specific to the experiment they are performing and fits easily into the workflow of doing their science.

PIMMS metadata is independent of data and as such is ideally suited for documenting model development. PIMMS provides a framework for sharing information about failed model configurations for which data are not kept, the negative results that don't appear in scientific literature.

PIMMS is a UK project funded by JISC, The University of Reading, The University of Bristol and STFC.

Metafor Publish in Geoscientific Model Development

Our parent project,  Metafor, have published a great GMD article which describes the Metafor Methodology on which PIMMS is based.
 Describing Earth System Simulations with the Metafor CIM.

Go Read!

PIMMS Workshop at the University of Bristol

Many thanks to the  PEGBOARD team for inviting us to come and talk to the paeleo climate modellers in the School of Geographical Sciences at the University of Bristol today. There were about 20 researchers at the workshop, a mixture of students, postdocs (including a brand new first day on the job postdoc) and permanent staff. Future PIMMS users of the world unite!

Bristol are going to be using the general CMIP5-like set of questions to describe their models with the specialisation of PIMMS to their requirements being done via experiments. So I guided the workshop participants through the different elements of PIMMS infrastructure and used our simplified UML to explain the distinction between "experiments" and "simulations" which we decided maps very well on to "experiments" and "jobs" in UM speak. I then did a live demo of  Gerry's marvelous experiment generator, creating experiments and adding requirements on the fly.

http://proj.badc.rl.ac.uk/pimms/export/69/Docs/simpleCIM.png http://proj.badc.rl.ac.uk/pimms/export/70/Docs/expMetManScreenShot.jpg

The Bristol researchers are very keen at the prospect of being able to share data with the other institutions that adopt PIMMS. There was also much positive conversation about how PIMMS is going to help them to get a better handle on how they describe their experiments and simulations. They are starting to plan how they can use PIMMS infrastructure to describe their experiments in a uniform ways. Result!

There was special interest in how they could use PIMMS to document information about the quality of simulated data. I have to admit that that isn't something that we planned for with PIMMS. It is true that PIMMS enables the possibility of creating metadata for simulations for which the data are later thrown away and I guess the absence of simulation data gives a good inkling that a particular configuration didn't work. Bristol fancy the idea of having a check box that says "this simulation worked" with the assumption that the run failed if that box isn't checked to encourage themselves to revisit their metadata.

PIMMS abstract submitted to EGU

We submitted this abstract about PIMMS to the  Metadata, Data Models and Semantics session at the EGU (European Geophysical Union) general assembly in April.

Title: PIMMS tools for capturing metadata about simulations
Authors: C. Pascoe, G. Devine, G. Tourte, S. Pascoe, B. Lawrence H. Barjat

PIMMS (Portable Infrastructure for the Metafor Metadata System) provides a method for consistent and comprehensive documentation of modelling activities that enables the sharing of simulation data and model configuration information. The aim of PIMMS is to package the metadata infrastructure developed by Metafor for CMIP5 so that it can be used by climate modelling groups in UK Universities.

PIMMS tools capture information about simulations from the design of experiments to the implementation of experiments via simulations that run models. PIMMS uses the Metafor methodology which consists of a Common Information Model (CIM), Controlled Vocabularies (CV) and software tools. PIMMS software tools provide for the creation and consumption of CIM content via a web services infrastructure and portal developed by the ES-DOC community. PIMMS metadata integrates with the ESGF data infrastructure via the mapping of vocabularies onto ESGF facets.

There are three paradigms of PIMMS metadata collection:

1 Model Intercomparision Projects (MIPs) where a standard set of questions is asked of all models which perform standard sets of experiments.
2 Disciplinary level metadata collection where a standard set of questions is asked of all models but experiments are specified by users.
3 Bespoke metadata creation where the users define questions about both models and experiments.

Examples will be shown of how PIMMS has been configured to suit each of these three paradigms. In each case PIMMS allows users to provide additional metadata beyond that which is asked for in an initial deployment.

The primary target for PIMMS is the UK climate modelling community where it is common practice to reuse model configurations from other researchers. This culture of collaboration exists in part because climate models are very complex with many variables that can be modified. Therefore it has become common practice to begin a series of experiments by using another climate model configuration as a starting point. Usually this other configuration is provided by a researcher in the same research group or by a previous collaborator with whom there is an existing scientific relationship. Some efforts have been made at the university department level to create documentation but there is a wide diversity in the scope and purpose of this information. The consistent and comprehensive documentation enabled by PIMMS will enable the wider sharing of climate model data and configuration information.

The PIMMS methodology assumes an initial effort to document standard model configurations. Once these descriptions have been created users need only describe the specific way in which their model configuration is different from the standard. Thus the documentation burden on the user is specific to the experiment they are performing and fits easily into the workflow of doing their science.

PIMMS metadata is independent of data and as such is ideally suited for documenting model development. PIMMS provides a framework for sharing information about failed model configurations for which data are not kept, the negative results that don't appear in scientific literature.

PIMMS is a UK project funded by JISC, The University of Reading, The University of Bristol and STFC.

PIMMS and ERMITAGE

How and why PIMMS and ERMITAGE have been collaborating

ERMITAGE stands for Enhancing Robustness and Model Integration for the Assessment of Global Environmental change  http://ermitage.cs.man.ac.uk/ It is an EU-FP7 project led by the Open University.
PIMMS has created a mind map for IAM Controlled Vocabulary for use by ERMITAGE  http://proj.badc.rl.ac.uk/pimms/browser/ControlledVocabs/trunk/IAM/IntegratedAssessmentModel.mm

  • Coupling models together is really all about passing some of the data that is output from one model and using it as input to another
  • You need to know about the complexity of each model to make sensible decisions about what information it is possible to pass between them.
  • So you want to be able to ask interesting science questions such as
    • “What plant species are included in this land surface model?”
    • There is no point passing lots of plant info if the model you are coupling with only has one parameter for vegetation.
    • PIMMS controlled vocabularies allow these questions to be answered
  • Charlotte Pascoe, the PIMMS Project Manager, has been attending (gate crashing) ERMITAGE meetings and collecting information about how the ERMITAGE modellers are describing their models and using it to build controlled vocabulary (CV) mind maps
  • In September 2012 Charlotte Pascoe was invited to become an official ERMITAGE stakeholder
  • The CV mind maps were presented to ERMITAGE at their September annual meeting and were greeted with excitement especially by stake holders from DECC and the UK Climate Change Committee who encouraged ERMITAGE to use PIMMS to document their models.
  • ERMITAGE have agreed to use PIMMS to document their models Feb-Sep 2013 (after PIMMS)
  • Support for ERMITAGE people using PIMMS software will be funded from the MIRP project based at CEDA
  • Plans are afoot for an ERMITAGE follow-on project that will continue the development of PIMMS CV for documenting Integrated Assessment Models

Using subversion to capture modifications to controlled vocabularies

  • Subversion has trunk and branches and tags and commit messages to record provenance.
  • Controlled vocabularies have supersets and branches, these need tagging and provenance information about updates needs to be recorded.

The similarity between the mechanism of controlled vocabulary development and subversion mechanics has not gone unnoticed by PIMMS, and we plan to use a subversion repository to manage our controlled vocabularies. This is a temporary pragmatic method for CV management that we can use until there is a CV server available. We plan to hold the mind maps in subversion not the stripped down xml - yes we know! - because we can't recreate mindmaps from the xml.

It should be noted that the subversion solution only provides a method for keeping track of updates to Controlled Vocabularies. It could just as easily be implemented with some other versioning system such as GIT.

We need to agree a naming convention for tagging updates to the controlled vocabulary parent superset. An appropriate naming convention would provide an opportunity for governance intervention to be included.

Only branches that add to the CV can be merged with the trunk.

Here's how the mechanics would work:

Controlled Vocabulary SpeakSubversion SpeakEnglish
Each discipline has its own CV parent superset Each discipline has it's own subversion trunkEach discipline has an overarching set of agreed terms
The CV parent superset for climate science is the CMIP5 CV developed by the METAFOR projectThe climate science trunk is the CMIP5 CV developed by the METAFOR project. In climate science the set of agreed terms were developed for CMIP5 by the METAFOR project
Parent superset CVs are tagged before branches are created The trunk is tagged before a subversion branch is created The set of agreed terms are given a label
New CV terms are added at branch level New CV are developed as branches off the trunk.New terms are developed in a parallel workflow.
Provenance information is recorded about additions to the CVCommit messages record provenance information for branch updatesA note is made about who added any new terms and why. e.g. "New CV terms added by the CASCADE project to support more comprehensive descriptions of Convection and Turbulence"
The new terms added at branch level are duplicated in the superset The branch is merged with the trunkThe new terms are spliced with the existing set of terms
The modified parent superset CV is tagged The updated trunk is tagged The newly extended set of overarching terms is labelled
  • Posted: 2012-12-18 11:47 (Updated: 2013-03-06 11:24)
  • Author: hearnsha
  • Categories: CV
  • Comments (1)

#pimmsmrd Twitter Hash Tag for PIMMS

The PIMMS project now has its own twitter hash tag #pimmsmrd which stands for PIMMS Managing Research Data

We will use the #pimmsmrd hash tag for our dissemination work and also encourage users of the PIMMS metadata creation kit to use #pimmsmrd to anounce when they've created metadata using PIMMS.

PIMMS Benefits

The JISC Managing Research Data Programme has asked all their projects to explain how they intend to gather evidence about the benefits their projects will bring to their communities. This is the PIMMS response.

Benefit Evidence Presentation
Change to User Practices Users create documentation at an early point in the workflow. Interviews with data managers at partner institutions establish baseline. The ratio of the number of simulations to the number of PIMMS records.
Reduced loss of access to data as a result of Post Doc turnover Post Doc and PhD students are using the PIMMS system. User status collated from login credentials.
Greater consistency and standards between projects to enable data re-use. Model data can be interpreted across partner institutions. Test of data node and PIMMS documentation to share data. Scientifically meaningful questions can be asked (and answered) of data at partner institutions.
Adoption beyond climate science PIMMS metadata creation framework used by other disciplines Number of distinct controlled vocabulary implementations.

The primary target for the PIMMS in the UK is climate modelling. PIMMS will be of particular benefit the UK climate modelling community because of the ubiquitous usage of the Met Office Unified Model, nearly all UK climate modellers use versions of this model, and it is common practice to reuse model configurations from other researchers. This culture of collaboration exists in part because climate models are very complex with many variables that can be tweaked and modified, therefore it has become common practice to begin a series of experiments by using an other climate model configuration as a starting point. Usually this other configuration is provided either by a researcher in the same research group or by a previous collaborator with whom there is an existing scientific relationship.

Beyond model configuration files there is no standard methodology for describing simulations currently in use by the climate modelling community in UK universities. Some efforts have been made at the university department level to create documentation but there is a wide diversity in the scope and purpose of this information. What PIMMS provides is a method for consistent and comprehensive documentation that is applicable to all climate modelling research, and so will enable the wider sharing of climate model data and configuration information.

The PIMMS methodology assumes an initial effort to document standard model configurations. Once these descriptions have been created users need only describe the specific way in which their model configuration is different from the standard. Thus the documentation burden on the user is specific to the experiment they are performing and fits easily into the workflow of doing their science.

Within this context PIMMS expects to bring specific benefits for the climate modelling communities in those UK universities that are part of the PIMMS consortium

Talks given at JISC MRD, October 2012

Included here are the PIMMS talks given at the JISC MRD review meeting in Nottingham, 2012.

1) Gerard Devine...."Documenting Simulation Workflow in the 'Cascade' Cloud Modelling Project"

source:/Docs/JISC_Nottingham_Oct2012.pdf

2) Charlotte Pascoe.... "Developing Controlled Vocabularies" source:/Docs/PIMMSJISCMRD2012_presentation.pdf

CIM experiments for Cascade

With the new CIM experiment generator in place (www.puma.nerc.ac.uk/cimexpgen) scientists from the Cascade project have been able to start producing Experiment descriptions for their project. In tandem with the generation of controlled vocabularies (upcoming blog post), the 'raw materials' are now in place for the creation of a bespoke CIM metadata questionnaire for Cascade.

Content of experiment descriptions The first stage in generating the experiment descriptions was introducing the scientists to the 'idea' of a numerical experiment (in CIM terms). Using the example of what was achieved in CMIP5, the scientists were soon able to begin relating this to their own particular project. Unlike CMIP5 however, which contained detailed lists of numerical requirements spread across a large number of numerical experiments, it was decided that cascade experiments could describe their experiments in much simpler terms. Specifically it was decided that cascade experiments were mainly classified by (1) location and (2) time period. Using these distinctions, 4 experiments were devised for Cascade;

  • Africa_July2006
  • Africa_Aug2006
  • Indonesia_July2006
  • Indonesia_Aug2006

denoting the two primary regions of study and the two periods of study that match well with observational data.

Experiment Generator The experiment metadata manager has now been used to document these experiments and to produce experiment cim documents that will be pushed to an atom feed (work currently underway), and thus become discoverable within the community through CIM portals etc, as well as being used to produce the bespoke Cascade metadata questionnaire.

source:/Docs/Cascade_Exp.jpg

  • Posted: 2012-11-12 09:00 (Updated: 2012-11-12 10:08)
  • Author: gerarddevine
  • Categories: (none)
  • Comments (0)

PIMMS Catch Up

Summer holidays and the mad month of meetings AKA September have put pay to our regular PIMMS blog posts. But we're back and firing on all cylinders so here is a catch up of how we're doing on the PIMMS work packages.

WP2 Refactoring the Metafor Questionnaire

Work has begun to bring together the elements of the PIMMS system into a single unified interface. This work is planned to take place in the final 6 months of the project and has begun with the creation of a comprehensive information flow diagram to capture when and how information flows through PIMMS.

WP3 User interface for experiment descriptions

Experiment documents explain why a simulation is performed. In the CMIP5 questionnaire created by the Metafor project these experiment documents where hard coded. PIMMS has created an interactive Experiment Metadata Manager so that users can create their own experiemnts. The Experiment Metadata Manger (EMM) is live at  http://puma.nerc.ac.uk/cimexpgen and has been tested by scientists from the CASCADE project at the University of Reading. Feedback from scientists about the EMM has prompted us to introduce a login feature so that the same service can provide different users with individual views of their experiments.

WP4 Installation and iteration

The CMIP5 questionnaire was installed at the University of Bristol Paleoclimate group very early on in this project. Present iterations of the questionnaire for Bristol include the addition of an additional section to collect data for the pupose of managing the running of simulations and the existing processing architecture that is used for visualisation. The additional content (beyond the CIM) will ensure that the new PIMMS metadata infrasturcture will be able to fit into the exisiting metadata collection protocols and users only have to interface with a single metadata tool.

WP5 Controlled Vocabularies

University of Reading: CMIP5 controlled vocabularies have been extended to cover additional requirements of the CASCADE. This includes additional vocabularies to describe cloud processes and turbulence and also vocabularies for the description of non-global Limited Area Models.

University of Bristol: The Bristol paleo-climate group have opted not to extend the controlled vocabularies used for CMIP5. Paleoclimate models do not differ from present day models except in terms of input files such as changed orography and land sea mask. Paleoclimate specifications will be handled by the Experiment Metadata Manager.

ERMITAGE: The FP7 Ermitage project integrates models representing climate, economy, land-use, agriculture and water systems to build an Integrated Assessment Model. A new controlled vocabulary has been built for this community and will form the basis of the PIMMS tool that will be adopted by the ERMITAGE project to document their models. This is an important first step towards IAM documentation that will be considered for future implementation for the IPCC Data Distribution Centre.

Text Mining: The work to assess the potential of using the University of Cambridge Chemichal Tagger software to parse journal articles for climate science text is completed. Chemichal tagger was able to be used not only to search for climate science phrases but also to create CIM xml documents.

WP6 Experiment Requirements

The creation of experiment documents is planned for October - December 2012. The University of Bristol will be making extensive use of the Experiment Metadata Manager to create experiment documents for their paleoclimate experiments in the comming months. Similarly for the CASCADE project will be using the Experiment Metadata Manager to document the simulations that are stored at the BADC.

WP7 Data and Metadata

Work has begun to prepare software to install a data node at the University of Bristol. A companion index node will be installed at the BADC to provide access to Cascade data holdings and to provide search functionality that will cover both CASCADE and Bristol Paleoclimate simulations.

WP8 Dissemination

PIMMS sent representatives to Open Repositories 2012 primarily to discuss text mining innovations within the project  http://proj.badc.rl.ac.uk/pimms/blog/2012/7 . PIMMS also participated in the National Centre for Atmospheric Scince (NCAS) conference where we captured use cases and workflow priorities from the NCAS community.  http://proj.badc.rl.ac.uk/pimms/blog/2012/6 . Most recently PIMMS sent a representative to the ERMITAGE annual meeting at which ERMITAGE agreed to use PIMMS as their primary model documentation. Regular dissemination activities have also been held within consortium member institutions. Final dissemination workshops are planned for February 2013.

Open Repositories 2012

PIMMS will be presenting our work blending the Metafor climate science controlled vocabularies with the University of Cambridge Chemical Tagger natural language processing software at an  Open Repositories 2012 pre-conference workshop on Monday afternoon. The  Working with Text - Tools, Techniques and Approaches for Data Mining workshop is being organised by  UKOLN.

 Here is our presentation and the abstract is below.

The PIMMS project and Natural Language Processing for Climate Science

PIMMS (Portable Infrastructure for the Metafor Metadata System) provides institutions with tools to capture information about the workflow of running simulations from the design of experiments (why) to the implementation of experiments via simulations running models (how).

PIMMS uses the Metafor methodology for simulation documentation which consists of a common information model (CIM), a set of controlled vocabularies (CV) and software tools. The initial deployment of PIMMS will support climate model documentation because it is based on the controlled vocabularies collected by the Metafor project in support of CMIP5 (5th Climate Model Inter-comparison Project). Controlled Vocabularies drive a web interface for collecting metadata however PIMMS is also exploring how the CV that is used to configure the PIMMS web interface may be of further use to our stake holders through the development of the text mining capabilities of the University of Cambridge Chemical Tagger tool.

PIMMS has extended and adapted the Chemical Tagger natural language processing software to be more relevant to climate science. The extension of Chemical Tagger to climate science involved writing further extraction tools (currently an XSLT approach is being used) to match natural language text from scientific papers to the controlled vocabularies developed in METAFOR and now PIMMS. The climate science adaptation of Chemical Tagger has been successful in processing a Geoscientific Model Development journal article to create a CIM document that can be viewed with the Metafor CIM viewer.

The success of Chemical Tagger has raised the question of how to interpret harvested CIM documents versus those created by modellers. The Metafor CIM was designed to be populated by climate modellers with the (probably over simplistic) assumption that if something isn't in the CIM document then it either isn't in the model or isn't relevant. However, CIM documents created by harvesting information from papers will naturally not cover everything about a model, so missing information doesn't mean that those things weren't included or aren't relevant. PIMMS will therefore need to describe different protocols for interpreting CIM documents depending on how they were created.

In essence the difference between journal article descriptions and metadata documentation is Narrative. Journal articles need to tell a story so the information they include is only that which is relevant to the narrative, whereas metadata documentation is an attempt to include as much as possible across the board. The general nature of metadata documentation is probably why it has historically been perceived as such a boring task to complete. PIMMS will make metadata documentation more fun by bringing back the Narrative, once PIMMS is established at an institution users will be able to create generalised metadata having only described those things that are relevant to the story of their experiment.

Integrating PIMMS into the Simulation Workflow

Thanks to everyone at the  NCAS staff meeting who came and talked to me about Integrating PIMMS into thier workflow. My particular thanks go to Steve Woolnough from the  CASCADE project who provided insightful information about just how many simulations don't ever make it out of the "lab".

PIMMS took an interactive poster to the NCAS staff meeting on the 18th and 19th June. We showed what we in PIMMS think of as the typical workflow of running climate simulations and asked participants to draw on what we might have missed out and place PIMMS logos where they would create metadata documentation both now and in an ideal world. The workflow diagram shows the additions made by the NCAS scientists in red and green, there's a link to a photo of the actual poster at the end of this post.

http://proj.badc.rl.ac.uk/pimms/export/57/Docs/PIMMSWorkflowGraffittiStickers_NCAS2012_noDecoration.png

In general people:

  • like the idea of incremental documentation
  • prefer the idea of automated documentation

The view from CASCADE

CASCADE is a NERC funded consortium project to study organized convection in the tropical atmosphere using large domain cloud system resolving model simulations. CASCADE sits at the bleading edge of climate model development and as such many of their configurations of the Met Office Unified Model (UM) crash or fall over before the simulation completes. It is estimated that only 10% of CASCADE simulations make it to the dissemination stage of the workflow. Therefore it has been the habbit of climate modellers to waiti until the dissimenation stage before documenting what they did.

It would be really useful to have a database of "I tried this configuration,.... it didn't work"

Negative results don't make it into the scientific literature! It could save alot of effort if information about failed configurations were shared even if the data is chucked out. This is a new use case for PIMMS.

PIMMS class documentation needs to be created by all developers of the Unified Model including those in the Met Office. When upstream developers at the Met Office are also using PIMMS then both Met Office and NCAS developers will only ever be required to provide small incremental descriptions. PIMMS will work best for climate modellers when everyone makes their contribution.

Workflow Poster with NCAS Graffiti

"Your poster is about the future not the past"

Questioning CIM Metadata

The CIM portal development is now in the hands of the  ES-Doc team and they are having a coding sprint this week. So we need to offer them a list of typical questions that we expect users to ask of CIM metadata about climate models and simulations.

The purpose of this blog post is to be a forum for us to gather these questions from a community that goes beyond the small group of us that work on PIMMS. Please enter your questions as comments on this post ASAP. We'll review them at the PIMMS hangout tomorrow afternoon.

How do we link a metadata record to its data?

One would think that there would be a simple solution to the problem of linking data and metadata but it turns out that Ben Goldacre's maxim "I think you'll find its a bit more complicated than that" holds true here too.

the Problem…

PIMMS needs to build a tool to link the metadata records we collect to the data that they are describing. Sounds simple enough, and it would be if everyone used the same archive structure and naming conventions, but we don't have that kind of control over the organisation of the data in A N Other institution that has decided to install PIMMS. So we need a generic method of linking metadata and data.

the Solution…

We reckon the solution will be some method of tagging the metadata and data records but to do that we need:

  • tagging - some method or methods of tagging the data and metadata records
  • tag store - some place to store our list of related tags
  • tag search - some way of finding the tags

We also need to decide

  • what kind of tag - how do we tag the data?
  • where to tag - to what data granularity should we add tags?
  • when to tag - when in the workflow do we add tags and link tags together?

the Big Question…

Has this problem already been solved? The JISC MRD community has lots of librarians, surely librarians have addressed the problem of how to match up books (data) with their library records (metadata). And there are such things as inter-library loans so there must also be a solution out there for linking together different data archives.

what PIMMS is doing…

We're scoping out the requirements for a tool that we will build to link metadata and data on our  wiki. We've come up with ideas for tagging, ideas for storing tags, and requirements for interacting with the tool. We're soliciting advice from everyone we know and it would be great to find out what the JISC MRD have to say!

  • Posted: 2012-05-25 12:26 (Updated: 2012-05-26 09:35)
  • Author: hearnsha
  • Categories: (none)
  • Comments (16)

Interpreting Harvested Metadata vs Documented Metadata

Hannah has managed to load the documents that she created by harvesting metadata from papers into the CIM document viewer (needs link). Hannah's documents used the  ChemicalTagger software and the  Metafor Controlled Vocabularies to parse text in journal articles. This has brought up a few errors that need fixing (eg the CIM viewer only lists the first variable in a list) but also raised the more interesting question of how to interpret harvested CIM documents vs those created by modellers.

The thing is that the CIM was designed to be populated by modellers with the (probably over simplistic) assumption that if something isn't in the CIM document then it either isn't in the model or isn't relevant. But CIM documents created by harvesting information from papers will naturally not cover everything about a model, so missing info doesn't mean that those things weren't included/aren't relevant.

PIMMS will need to describe different protocols for interpreting CIM documents depending on how they were created, but we will also want to ensure that that CIM accounts for missing data more intelligently in future releases.

In essence the difference between journal article descriptions and metadata documentation is Narrative. Journal articles need to tell a story so the information they include is only that which is relevant to the narrative, whereas metadata documentation is an attempt to include as much as possible across the board. The general nature of metadata documentation is probably why it has historically been perceived as such a boring task to complete. PIMMS will make metadata documentation more fun by bringing back the Narrative, once PIMMS is established at an institution users will be able to create generalised metadata having only described those things that are relevant to the story of their experiment.