Ticket #683 (closed issue: duplicate)

Opened 12 years ago

Last modified 12 years ago

[M] Chances of machine assisted discovery in NDG2?

Reported by: lawrence Owned by: rkl
Priority: required Milestone: PROD Final
Component: vocab Version:
Keywords: Cc:

Description

It would be nice to deploy machine assisted discovery in NDG2!

Will we have any capability to do narrower/broader calls on a vocabserver web service by the end of June?

Change History

comment:1 Changed 12 years ago by rkl

  • Status changed from new to assigned

Version 2 of the Vocabulary Server API might just miss the end of June, but not by much. I will have the mappings encoded by the end of April (about 3-4000 done so far) and the final version 2 API specification by the end of May. Coding should begin mid-June (should take about 2 weeks), but may slip slightly if another non-NDG project finishes late.

Any input on the method design would be appreciated. There is a draft of my ideas on the BSCW, but I'm not trying to write code against it!

comment:2 Changed 12 years ago by lawrence

The simplest thing we can do (and therefore the thing we will do first), is after a discovery search, offer a broader than search (with number of hits, driven by ajax so it doesn't slow things down), and a list of narrower than search options (also with number of hits). See ticket:685.

comment:3 Changed 12 years ago by rkl

OK, remember the mappings are based on SKOS subset (exactMatch, broadMatch, narrowMatch and minorMatch). The API needs a control on the relationships to be included (4-bit mask represented as a single hex digit?). Providing the functionality you describe would have to include narrowMatch (= broader than, W3C logic for you)and exactMatch. Bringing minorMatch in as well would be an interesting experiment.

Also we need to think about how terms should be input to the method. Options are as a URI or as 3 parameters (term, vocabulary list name, version number) which translated to a URI inside the method. Any preference? Also would you want to restrict the target vocabularies in some way (note this has no performance implication: the search is done on a single indexed table no matter how many lists are included)?

Detailed guidance on what you'd like to get back would also be helpful. My opening gambit would be a structure like:

<termlist> <term><uri> http://.....key1</uri><term_text>Blah1...</term_text></term> <term><uri> http://.....key2</uri><term_text>Blah2...</term_text></term> </termlist>

comment:4 Changed 12 years ago by lawrence

... would be tempted to ignore minormatch for now ...

There are two places we would want to do narrower/broader

  • from the general discovery box as described in ticket:685, in which case we only have a word, no list, on context, nothing. Can we have an api that simply takes a word, and internally looks it up, and then uses a more sophisticated API to do the narrower/broader match?
  • from within the parameter display, where we would have a list for context, so could use term,vocab on input ... (probably wouldn't want to worry about the version number at that level).

comment:5 Changed 12 years ago by rkl

Lack of context in the first use case is something that I'd overlooked. Will think about ways around it over Easter. Vocabulary versioning can be circumvented by specifying 'current' as the version, which will never cause issues the way the back end is managed, but might under some other management regimes I can think of....

comment:6 Changed 12 years ago by rkl

Right, I've been giving the 'discovery free search' use case some thought and have been doing a bit of playing about with a vocabulary map containing some 5000 relationships.

The free-text input isn't necessarily a word, it's a string that may be a word, a phrase or even part of a word as the assumed search model is wild-card string matching. So, the input term has to be wild-card matched against something and the obvious target is the mapping predicate term text.

Taking this approach with 'precipitation' hits 24 predicates:

"Central Indian Precipitation Index" "Drought/Precipitation? Indices" "Drought/Precipitation? Reconstruction" "EARTH SCIENCE > Atmosphere > Precipitation" "EARTH SCIENCE > Atmosphere > Precipitation > Acid Rain" "EARTH SCIENCE > Atmosphere > Precipitation > Droplet Size" "EARTH SCIENCE > Atmosphere > Precipitation > Hail" "EARTH SCIENCE > Atmosphere > Precipitation > Liquid Water Equivalent" "EARTH SCIENCE > Atmosphere > Precipitation > Precipitation Amount" "EARTH SCIENCE > Atmosphere > Precipitation > Precipitation Rate" "EARTH SCIENCE > Atmosphere > Precipitation > Rain" "EARTH SCIENCE > Atmosphere > Precipitation > Sleet" "EARTH SCIENCE > Atmosphere > Precipitation > Snow" "EARTH SCIENCE > Climate Indicators > Drought/Precipitation? Indices > Crop Moisture Index" "EARTH SCIENCE > Climate Indicators > Drought/Precipitation? Indices > Palmer Drought Crop Moisture Index" "EARTH SCIENCE > Climate Indicators > Drought/Precipitation? Indices > Satellite Soil Moisture Index" "EARTH SCIENCE > Climate Indicators > Drought/Precipitation? Indices > Surface Moisture Index" "Enso Precipitation Index" "Precipitation" "Precipitation Amount" "Precipitation Anomalies" "Precipitation Rate" "Precipitation and evaporation" "Standardized Precipitation Index"

Next stage is to add any synonyms of these predicates, which in this case found nothing

Finally, we add terms that are mapped narrower than the initial search string, which adds:

"Acid Rain" "Crop Moisture Index" "Droplet Size" "EARTH SCIENCE > Atmosphere > Atmospheric Water Vapor > Evaporation" "EARTH SCIENCE > Atmosphere > Atmospheric Water Vapor > Evapotranspiration" "Fire Weather Index" "Forest Fire Danger Index" "Freezing Rain" "Hail" "Hydrometeors" "Liquid Water Equivalent" "Palmer Drought Crop Moisture Index" "Palmer Drought Severity Index" "Rain" "Satellite Soil Moisture Index" "Sleet" "Snow" "Surface Moisture Index"

This works reasonably, except that we find evaporation terms as well as precipitation (because one of the predicates was 'evaporation plus transpiration'). The same problem hits 'salinity' searches, which introduce temperature because of the coupling of salinity/temperature through density.

I'll set up a method in the API to implement this type of search, although we may need to tweak the algorithm and possibly set up a 'magic bullet' target vocabulary to source the predicates rather than searching all vocabularies as I'm doing at the moment. That should reduce the number of 'surprises'.

Next stage is to extend the mapping population to an operational level, which means doing the mapping between CF Standard Names/BODC and CF/GCMD.

comment:7 Changed 12 years ago by selatham

  • Milestone changed from BETA to PROD Final

comment:8 Changed 12 years ago by rkl

  • Status changed from assigned to closed
  • Resolution set to duplicate

Will be resolved once external access to V1.1 API is resolved and this is coverd by another ticket.

Note: See TracTickets for help on using tickets.