wiki:Notes100304

Bryan's Notes from the 4th of March MOLES workshop

Rather than present notes on all the points in agenda, these notes have been "organised" into themes, action plans will be drawn up under the auspices of the NERC SIS ...

Documentation Needed

Still a disconnect between the aims of MOLES and the expectations of those around the table (physical and virtual):

  • Really need to work up the MOLES paper to include lots of use cases explained in words and UML - not XML.
  • Need a "This is MOLES presentation": with a short summary set of slides and a longer more detialed set of slides.
    • Can we think of automatically generating this from the XMI using ODF so it can be kept up to date?
  • Need an executive summary paper (no more than four pages) explaining "Why MOLES".

Still need to make sure that the relatinship with O&M is fully characterised. Simon tells us that O&M has a relationship for sampling feature which can be used (with an association class) to establish feature relationships. These sort of "dependencies" need to be clear in MOLES.

Key points discussed:

  • MOLES as a coathanger - holding "coats" conforming to domain specific application schema, and providing links not only at "the collar level" but all the way down ...
    • In particular, organising links into O&M instances in native schema and extending where necessary for interdisciplinarity
  • MOLES as an intermediary between discovery and usage ("B" metadata)
    • Noting ISO19115 heritage as maps and imagery (part 2)
  • MOLES as the orienteering part between teleporting and usage ...
  • MOLES is cross disciplinary:
    • Corollary of that is that MOLES may be unsuitable for all discipline specific tasks, hence coathanger not coat.
    • e.g. needs to be good enough that a crop specialist can find and interpret the reliability of weather data, but not so good that an atmospheric scientist can necessarily infer all the provenance necessary to use the same data to get at the atmospheric science processes (without detailed atmospheric science attributes in the "coat").

Need good examples:

  • Text stories, and object diagrams, but avoid relying on object diagrams alone: Simon quoting ?: "Object diagrams can be like accidents in legoland"
  • Need to give a specific example where MOLES adds value to geosciml (not supplants!)

General

General points:

  • Platform is very much about location need to make that clear (hmm says Bryan a postiori, what was that about?)

State of ISO19156

  • At voting stage, final editorial modifications made.

Lessons from Metafor

State of the play in vocab service land

Discussed the necessity of gold standards in vocabularies. Allowing folks to edit copies of what purports to be a common vocabulary is recipe for disaster ... in this instance centralised reference vocabs are crucial. Edits need to be marshalled centrally.

BODC Vocab Server

  • V1 operational
  • V1.5
    • includes write access via a server API
      • editors need to register with BODC, then they get access to "their" lists
      • BADC to test this for CF in the next few weeks (after a long hiatus)
    • also write access via a web form interface
  • V2.0
    • Restful API design partially completed.
      • Needs revision.
      • Interesting digression on restful apis and vocabularies in an international environment. Needs nailing down.
    • Needs funding for development, but this should happen under NETMAR (Open Service Network for Marine Environmental Data)
  • Need to work out how to get to common standards for interoperating between vocabulary servers
    • Collaboration with isocat encouraged.

Serialisation ETC

Future for  FullMoon in question?

  • Not a funded project with no formal developer community.
    • Hard to make a high priority since in principle it's only used once every few months - but it's still crucial, especially in developing and evolving new models.
  • If we go down the UML-XMLS route, then we'd need to make sure we could support full moon ourselves (either directly or in partnership with CSIRO and others).
    • GPL codebase ...
  • BNL noted that now it's using latest eXist compatible xqueries we could remove the relatively unreliable (aka not easily deployable) java harness and replace with a cleaner scripting harness (e.g. based on python).

Note that the  SolidGround plugin for FullMoon is supposed to  generate database schemas. We need to learn about this.

  • But note that it doesn't exploit ORMs, so it may help generate storage, but not to build systems around it. We may want to go via an ORM ... eg. django or sql-alchemy.

Note also new  HollowWorldhelper plugins which make sure that we end up with serialisable schemae.

We don't want to depend on proprietary commercial solutions for a fundamental part of our metadata systems.

Futures of WaterML

  • We understand that a (the next) major version of waterML will be O&M compliant.
    • Which means it will be a natural fit for the MOLES coathanger and the use of SOS.
    • establishing common approaches to timeseries data
    • Documenting procedures.
  • See nice  analysis from Peter Taylor (OGC login needed)

Specimens, analysis and MOLES

While MOLES has been developed specifically with sampling in mind, we hadn't fully thought through the use cases associated with specimen sampling and remote analysis.

  • But O&M has done more of this than most of us (except Simon, obviously) had appreciated.

In this context:

  • The place where the remote analysis happens is important: it's a key part of the provenance, who doe what and what their reputation is.
    • This would need an attribute on the processing side ..
  • We need to clearly distinguish between "ex-situ" and "in-situ" specimens.
  • One obvious use case to think through is that associated with trace gas sampling of ice cores. We need to think of the acquisition of the ice core, then the slicing, and removal of the sample to an analysis environment, and then the analysis.

Simon argued that most "laboratory sampling" consider the acquisition as everything up to the first number. Bryan argued that acquisition is just that, and then we have processing/analysis and we need to get that right. The remote sensing analogy is that level one data are just voltages and that's the acquisiton, level 2 (geophysical quantities) and level 3 (geolocated and gridded) appear as processing steps. In fact of course, this is metadata about a data, so we can have:

Level1 Data has only an Aquisition element included in the metadata
Level2 Data has both an Acquisition and a DataQuality (Processing) description

Then the equivalent for most ex-situ data would be identical: multiple elements are described.

  • But let's return to humans as the instrument, and "collection protocols", have to make sure the processing side isn't too computer focussed.

The issue is then how we do the equivalent in moles for geochemistry etc.

It's true that Simon is currently uncomfortable with this, so we need to resolve it. Bryan is less uncomforatable, and most everyone else seemed agnostic. We need to discuss it more, and put up some real examples.

(This is a fair reflection of the discussion to this point, but doesn't reflect current thinking, I'll put a link to that here when we've documented it).

We discussed "conditioning", a new idea from Spiros, but general opinion was that perhaps it wasn't necessary as described, although if Spiros still thought it was a good idea after reflection, he should bring it to the list as a specific (documented)) suggestion.

  • Conditioning was a process that involved manipulating the physical world (e.g. growing a culture), not something that involved numbers. (Certainly this concept needs to be included, but where?)

Simon noted that specimen handling can exploit the related sampling feature association in the sampling feature schema (it has an association class which can be used to explain the relationship - but it may need further specialisation beyond a generic name to get at processing and "conditioning".

Importance of detection limits noted - it is a cross disciplinary concept.

Agreed that given the timescales, the group working on geochemistry should be encouraged to work with Geosciml and/or extend vanilla O&M, and then for MOLES to learn from that exercise and decide what lessons need to be learned.

Futures

Clearly the future of funded work on MOLES in the NERC environment is as part of the Science Information Strategy (SIS) implementation plan, where MOLES is currently the only candidate for "B" type implementation. This will move us past the current hiatus in MOLES development as it hasn't been funded for the last six months of FY09/10.

Hopefully, if MOLES is useful, other groups may contribute to the information modelling (as indeed we're starting to see).

Developing the Information Model

..

Procedures for evolving the model

  • All modifications to be described and discussed separately (clearly we will have dependencies ... but we need to make each move clear enough to be understood).
  • Spiros to tabulate the differences between 3.2 and 3.3, ticket each one, and then bring each one to the list as a specific email. If there is no discussion within four weeks, a benevolent dictator (Bryan for now) will make a decision ...
    • Aim to get 3.3 finalised as soon as practicable (realistically that'll not be before the end of April, given Easter, and other existing commitments).
  • Need a common requirements register, so we know what isn't in MOLES, just as much as we know what's in it that we're discussing.
  • All major versions (X.Y) need to update instance examples on tagged release.

Plans for the Model

Metadata Creation

  • Clearly we need forms, but these can't be automatically generated from schema alone; domain knowledge is needed to build a seuqence of metadata capture which reflects the way scientists think. Cf the metafor questionnaire for initial capture as opposed to the use of geonetwork for editing ...
  • Role of scripts and transforms to bring stuff in from other schema (and put links back out to the originals)
  • Can we learn lessons from  mikado which seems to be quite sophisticated?

Round Table

  • BGS:
    • Exercise MOLES for the seismic data in the context of Geoseas (this data not handled by geosciml).
    • look at seismic surveys in general
    • Boreholes
    • Evaluate current state of internal metadata against MOLES requirements.
    • NB: most of BGS work about interpretation not about observations per se (and GeoSCIML biased that way too)
  • CEDA:
    • Complete work with MOLES2 (e.g.  NEODC)
    • evolve moles2 creation tools to support moles 3.Y (need to decide on Y), probably using django in preference to pylons
    • improve the look-n-feel of our moles portals (currently the look-n-feel is six years old, and never got any loving then)
    • look at bringing moles and metafor into line - improving computation within MOLES and realigning Metafor to be consistent with the MOLES (ISO19115-2 heritage)
    • Start to look at MOLES and ESA HMA and SAFE.
  • CEH:
    • Look at O&M for water quality data
    • concentrating on discovery at the moment, but recognise project type information is needed. Worried about data rating and feedback ("c-type") - but BNL thinks this is bizarre to do before getting the scientific provenance.
    • sending someone to the waterML meeting, and looking at in context of VO.
      • interested in OGC/WMO domain workging group and WaterML 2.0, along with surface and ground water interoperability experiments.
    • big worries about license management, hoping to get something from OGC in this space.
  • EOF: not represented due to illness
  • BAS:
    • to work with BADC on model data
    • Aiming to exercise MOLES with seismic data and swath bathymetry
    • Prototype development to be considered under the auspices of the SIS.
  • BODC:
    • Roy now working on vocabs primarily via EU projects
      • MOLES a secondary priority behind discovery as they can exploit their databases to generate MOLES (or a significant proportion) without as much drama as those of us who have to collect the info in the first place.
        • still working on mapping moles onto their schemae

Issues to pick up under the SIS:

  • Technical commitment (who, how much effort, when?)
  • Scientific commitment (as above)
  • Coordination and management, leadership, all under the auspices of the architecture working group of the SIS.
  • Succession planning within the centres for their MOLES leaders.

Immediate issues for the architecture working group:

  • Quick wins from MOLES
  • Need instances ...
  • Organise next MOLES meeting (and work out MOLES governance plans, internal to NERC, and more widely).
 Trac Powered
Site hosted at the
British Atmospheric Data Centre
for the