Ticket #592 (closed task: fixed)

Opened 13 years ago

Last modified 12 years ago

[M] DIF-to-minimum MOLES transformation

Reported by: mpritcha Owned by: ko23
Priority: blocker Milestone: ReFactored_Discovery_WebServices
Component: discovery Version:
Keywords: MDIP WS-Discovery2 Cc:

Description (last modified by selatham) (diff)

Needs these as a minimum (if available in source discovery format).

  • citation
  • description
  • summary/abstract
  • title
  • author (i.e. Dataset_Citation/Dataset_Creator)
  • data centre
  • links (i.e. Related_URL)
  • date (something close to "last updated data of dataset") ...Bryan? - agreed this will be coverage start or end date for ordering.
  • spatio-temporal coverage fields (bounding box, start and end date/time)

Change History

comment:1 Changed 13 years ago by selatham

  • Summary changed from [M] DIF-to-miniMOLES transformation to [M] DIF-to-minimum MOLES transformation

comment:2 Changed 13 years ago by ko23

As a minimum you can have:

  • summary/abstract
  • title
  • data centre, especially if there is an existing MOLES record to point to.

Even then, abstracts such as "Krishnamurthi sent his analyses to the DSS" might be seen as cryptic, so usage guidelines are required.

comment:3 Changed 13 years ago by ko23

Note as well that the "first pass" will assume that that lat/long will be in decimal format and that there are "XSD valid" dates. Any further formats will be dealt with later.

comment:4 Changed 12 years ago by ko23

  • Status changed from new to closed
  • Resolution set to fixed

To add notes for the release already provided.

I think most things are there IF populated. Please let me know if you think something is missing.

It will only work with valid DIF records, and a surprising number weren't.

Note that "author" also includes /DIF/Data_ Set_Citation/Dataset_Creator, the because "Originating_Center" could reasonably be used as the originating centre of the metadata record and this is tag pretty explicit!

The XQuery does some date validation, but this type checking should be extended.

What's not there is the look up, where possible, of proper term keys, so I've kept to non-NDG namespaces for these to indicate this. However, this can be added IDC. Similarly, DPTs and Activities aren't extracted, but could be, if certain "default" MOLES records are available.

However, I think that this does what's required for the panic phase; please re-open, stating why (field and DIF record details appreciated).

comment:5 Changed 12 years ago by selatham

  • Status changed from closed to reopened
  • Resolution fixed deleted

Still needs to handle DataProvider 'groups'. This is the way we are filtering whether something is MDIP or not.

So all records coming in will need an associated MOLES record generated for the Discovery to work. This includes non-MDIP DataProviders? records. Each will need to be distinguished as being MDIP or not - according to whether it has 'MDIP' in Structured Keyword.

The term 'NERC-DDC' allows similar filtering for the NERC Designated Data Centres - i.e. those that were in the NERC Metadata gateway.

The term 'NERC' was thought might be useful to distinguish NERC datasets from ' all the other stuff' - WDCC, NCAR etc. etc.

comment:6 Changed 12 years ago by ko23

This assumes that the vocab will never be updated. If this is confirmed, that I can do the necessary hard-wiring; otherwise it is an unnecessary maintenance problem.

comment:7 Changed 12 years ago by selatham

  • Description modified (diff)

groups plus some other content parsing will be handled outside of this XQuery runner. See ticket #35

comment:8 Changed 12 years ago by ko23

  • Status changed from reopened to closed
  • Resolution set to fixed

Sufficient appears to be there for MDIP purposes. Valid MOLES records are output, and there is a requirement for a proper MOLES organisation record for the data curator, but other entities are created per data entity as doing this properly on the fly is a problem beyond practicality. For more detail, see readme with release.

I'll extend MOLES content over time as a background activity, but any specific requirements should be ticketed.

comment:9 Changed 12 years ago by selatham

  • Status changed from closed to reopened
  • Resolution fixed deleted

Identify 'groups' keyword (it will be 'MDIP', 'NERC-DDC', or 'NERC') from DIF keywords field and set up in the generated MOLES.

comment:10 Changed 12 years ago by selatham

Accept localID as a command-line argument. To ensure that localID in the moles is the same as the localID within the discovery record.

comment:11 Changed 12 years ago by lawrence

  • Status changed from reopened to closed
  • Resolution set to fixed

I'm closing this. The basic functionality now exists, and we should ticket specific problems with enough detail that each can be can be identified, and dealt with sequentially.

Note: See TracTickets for help on using tickets.