Version 9 (modified by mpritcha, 12 years ago) (diff)


NDG "Room 101" Meeting


Meeting to remind ourselves of what we have, why, & what we have learned along the way.
Held in CR03 on Friday 22 August 2008.


  • Sam Pepler
  • Dom Lowe
  • Phil Kershaw
  • Steve Donegan
  • Kevin Marsh
  • Bryan Lawrence
  • Stephen Pascoe
  • Matt Pritchard
  • Calum Byrom (by phone)

Intro from Matt

  • Good opportunity to see where we are & look at improving the way we do things.
  • Room 101 ...ok, not quite. We're not going to vote for our least favourite components & see them disappear. In reality, design & development decisions have already been made, often for good reason (but it's worth reminding ourselves what those were & seeing if we're happy with the decision-making process ...not just on this project).
  • NDG took some excursions down some blind alleys. Did we learn things along the way? Are there lessons to be learned from how we got down these blind alleys (without dwelling on what was down there?)

Review of Development Approach

How did we go about designing and developing the system?

  • Use of RM-ODP architecture
  • What approach did we use to the software life-cycle and did it work?
  • Development across multiple institutions

Review of current components

  • COWS
  • CSML
  • MOLES (v2+incomplete infrastructure, v3 on way)
  • Discovery (v2)
  • Security
  • Vocabserver

Suggested points for discussion for each component:

  • Overview (brief!)
    • Reminder of what it does, where it fits in
  • How did we get there?
    • Was it designed or did it grow organically?
    • (Would we do the same again, starting from scratch?)
  • SWOT
    • Strengths
      • Particular successes / things learned
      • Fitness for purpose
        • Does it do what it was specified to?
        • Does the spec meet current needs?
        • Have people been able to deploy / integrate / use it?
    • Weaknesses
      • What obstacles have been encountered?
      • Have these been overcome?
      • By satisfactory means?
    • Opportunities
      • What have we learned while building this?
      • What is the roadmap for this component?
    • Technology (...or Threats?)
      • Have we used appropriate technologies?
        • Element tree for XML procesing
        • Exist for XML databases
        • SOAP toolkits
        • WS-Security
        • Pylons
        • Postgres
      • Has development been guided too much / enough by available technology
      • Have we reinvented wheels?

Going forward

What new requirements lie ahead

  • ??

Sum up

Notes from Meeting

I have tried to attribute comments to people where my notes captured this. Please read and amend as necessary. Where my own interpretation of my notes has been added post-hoc, this is indicated in italics.

Review of Development Approach

(BL) NDG wasn't [necessarily intended to be] a software development project at the start. As such, it didn't have very clear requirements at the start. We had to "do" the buzzwords in order to "do" e-science as far as funding was concerned.

How to roll out deployments : we are still asking questions now as to how we integrate security into BADC. Could it be done in smaller chunks or do we have to wait until a whole chunk is ready before attempting deployment?

(Sam) Initially there was a "do everything" approach [i.e. see which things were successull to aid in narrowing down candidate technologies for solving certain problems?]

(Stephen) Learned about modular software development, but (Bryan) at the time, there were no partners actually signed up to implementing [specific components]. It seemed that everyone was waiting for a complete system.

In order to make things better implementable [do we mean deployable?], things need to be in smaller chunks and this needs to be done all along.

Why didn't this happen?

  • Lost people at certain institutions (esp. key ones with link to Data Centres)
  • Had "services" but no [p??? ...sorry can't read my own notes] to start with

Was there a timetable for implementing these things?

  • Comes down to tightly-defined requirements, or lack of.

Example : CSML

  • (Dom) Suffers from being at the end of the data delivery chain. [All other bits in chain need to be deployed in order for this to be tried out properly / generated useful feedback]
  • (Matt) Should timetable for deployments be tailored to position in data chain?

(Calum) NDG development seemed to have a very adaptive approach, with a basic idea of what it wanted to do. This maps well onto the agile software development approach (cf. predictive where everything is very well planned out from the start).

  • Good because not too much time is spent down dead ends
  • But relies on very good communication between all involved

Maybe there was some attempt to be predictive in some parts of the project (bits of project were agile, bits predictive). But there was not enough attention to the relative pace of the different development streams.

RM-ODP model

Reference Model of Open Distributed Processing

Start off with what needs to be achieved (Enterprise Viewpoint), and develop other viewpoints to develop a specification of the whole system (others are information, computation, engineering, technology,

(Calum) Comment : Lots of code in the NDG stack seems to be non-OO. (Phil) RM-ODP gave structure at the start, and hence a structured approach. But it felt like we fell off the end of something. This had benefits (adaptive style) but at the expense of some loss of structure.

Plus some things were never deployed.

[RM-ODP doesn't provide any help beyond the initial design phase]. Is there another [complementary?] model that is more applicable further down the development path? [Flag this, and RM-ODP in general, as something to discuss in more detail at some point.]

Development across institutions

EDP / NERC Portals experience : suggests benefits of ensuring that at least 1 person is actually using a particular component, in order to provide feedback on it.

MOLES : There seemed to be someone in each institution trying to use it. But many people were trying to understand DIFs, let alone MOLES (some "fear" of it), even worse with CSML. Perhaps should have implemented (& made deployable) early on to get people on board [ showing what benefits were obtained rather than what overhead it entailed?]

Review of components



Low-level toolkit aimed at building data services including visualisation.

With fairly small amount of "glue" code, can produce services.

COWS was an example of change of tack. Realised that it was not going to be possible to integrate DataExtractor (Dx) code (for various reasons, Ag's availability, difference of approach etc).


  • Standards compliant (built on OGC etc)
  • Can quickly implement any dimension we like
  • Presenting spatial data via JavaScript? map interfaces is now very prevalent on the web [e.g. Google Maps]. This is a "big win", in that we now have lots of expertise in this.
  • ...whereas Dx was a component all to itself (& the code of 1 person who didn't have enough time)


  • Never got visualisation tools quite there actually joined up to data.
  • Not completely integrated with CSML


  • Code development
    • Got to stop situation where only 1 person writes code [Another item for further discussion]
    • Very useful to review code at the end of a project.
    • People have different levels of skill : code review is good way of demonstrating good code to team members
    • Need to take step back occasionally and ask questions
    • Unit testing : very useful but not often done
      • Requires certain mind set at start and end of project
      • Time constraints can be an obstacle to proper testing. Unit tests are fairly easy to do; system tests more tricky and require more discipline.
      • Misdirected thinking to say that unit tests "take up" time (often happens early on when developers want to "get in the thick of things" straight away & see/demonstrate some result).
    • Almost everything is not quite finished (Bryan's whiteboard sketch)
      • In order to move forward, we must actually finish & implement across all our grid series data in the Data Centres.
    • There always seemed to be 1 thing in the jigsaw that was in such a state of flux as to prevent the whole from working at any time.
    • We have to think about deploying earlier, accepting that some bugs need to be left, to be fixed in a later version.



GML application schema to describe content of files (concerned with data structure). XML schema, followed modelling frameworks. High-level API for making subsetting requests.

Strengths / Successes

  • Beyond NDG, interest in CSML from ocean community groups e.g. ECOOP, MarineXML.


  • Hasn't really been implemented (deployed?) yet. Working in prototype but no effort on part of data scientists. All the bits are there but...

(Stephen) Not convinced that Feature Type approach is the way forward. See GML : simple features. How is this ever going to join up with Feature Types? This is what INSPIRE is mandating...

(Bryan) Problem with CSML historically : nothing changes except through Andrew Woolf

CSML was designed to have different I/O layers & be lossy. It would never be as good as reading netCDF directly, but could read netCDF & PNG files at the same time.

HDF Example : It would be great to plot Grape + ERA40 data on same plot : that would be something really new i.e. an easy win for demonstrating CSML's capability.

(Bryan) We have the middleware layer, but now need to get the benefit of it. We should make a big effort to get ~50 datasets working with CSML (albeit if all 150 is impossible for now).

Granulite concept should help in trying to deploy some benefits (e.g. getting parameters into MOLES records), rather than all aspects (e.g. visualisation).



...What do we mean by MOLES? Do we mean just the schema, or everything including Browse, Discovery. Difference in perception even amongst ourselves so perhaps not clear enough about this. XML Schema : heavily flawed schema representing some quite good ideas, with a relational database schema that is flawed2 !!


Main problem is difficultly / inability to change. Even if the (complex enough) XML schema is changed, the relational DB schema (and subsequent changes to the editor interface) take too long to implement. New schema should be much more lightweight. XML database to be used instead of RDB which should help make evolution easier now.

In designing MOLES, we should have concentrated on the aspects of the metadata model (and hence schema) that were unique to environmental science, rather than those that were already familiar (to the developer). There are lots of instances where existing models for generic things (like people, organisations etc) could have been imported into the model rather than re-engineered.

To be fair to Kevin (developer), he kept asking for feedback on the design of MOLES as it was progressing but got hardly any. We now know what we can with a metadata model, so presumably would be better placed to provide constructive criticism.


(NDG team) No point producing schema unless accompanied by lightweight tool to help users populate example instance documents.

(Data centres) Need to define tools early on that satisfy the requirements of the data centre in terms of populating metadata records.