Ticket #768 (closed defect: fixed)

Opened 12 years ago

Last modified 12 years ago

[M] Discovery search not searching with full search details?

Reported by: selatham Owned by: selatham
Priority: critical Milestone: PROD Step1
Component: discovery Version:
Keywords: Cc:

Description

Searched on text='ukho' and restrict to 'MDIP'.  http://glue.badc.rl.ac.uk/discovery?searchString=ukho&textTarget=All&startDateYear=&startDateMon=&startDateDay=&endDateYear=&endDateMon=&endDateDay=&bboxN=%2B90.0&bboxW=-180.0&bboxE=%2B180.0&bboxS=-90.0&source=MDIP&advanced=1

Returns with no records found - should find two for ukho. Also doesn't show the text as being part of the search. "No records found (Global; Restricted to MDIP)."

Change History

comment:1 Changed 12 years ago by fvenuti

About the advanced search, as I mentioned during last AH meeting, it is tricky to search two different databases (postgres and eXist) and then find the intersection of the two results sets. The problem is that unless you retrieve the whole set of results from both DBs, you won't find the true intersection and therefore you will miss records that ought to be there. But retrieving the whole set of results becomes unfeasible as soon as you start having a few thousands of records in each DB. I don't know whether this is relevant for your search (don't know whether the restriction uses postgres in this case), but the problem will be apparent when searching for free text and then restricting to a specific time/spatial coverage.

comment:2 Changed 12 years ago by mpritcha

Is the problem in this case is that the 2 ukho moles documents (ukho.ac.ukMDIPRSDRA2006000377335.xml, ukho.ac.ukMDIPRSDRA2006000377384.xml) have the wrong ParentListID? I haven't seen "MDIP_TargetVocabulary" before... I thought it should be something like " http://vocab.ndg.nerc.ac.uk/term/N010/current" ?

       <dgStructuredKeyword>
            <dgValidTerm>MDIP</dgValidTerm>
            <dgValidTermID>
                <ParentListID>MDIP_TargetVocabulary</ParentListID>
                <TermID>001</TermID>
            </dgValidTermID>
        </dgStructuredKeyword>

My code calls the spot-vocab function rather than look for a specific string, so it depends what Kev has told it to look for.

where document($name)/moles:dgMetadata/moles:dgMetadataRecord/moles:dgStructuredKeyword
[moles:dgValidTerm &= 'MDIP' and voclib:spot-vocab($voclib:ndg_data_provider_vocab, moles:dgValidTermID/moles:ParentListID)]

comment:3 follow-up: ↓ 4 Changed 12 years ago by selatham

  • Owner changed from lawrence to selatham

Looks like they've reverted to some old records with incorrect content for the vocab. I'm sure this used to be sorted. I'll look at backups and might re-harvest.

comment:4 in reply to: ↑ 3 Changed 12 years ago by mpritcha

Replying to selatham:

Looks like they've reverted to some old records with incorrect content for the vocab. I'm sure this used to be sorted. I'll look at backups and might re-harvest.

OK, however Fabio makes a good point and we should look at this. In the mean time I would argue that the bounding box should be left out if it's not required. My code will not perform the spatial search if no bounding box is specified ...and it's doing extra unnecessary work if it's always doing a "global" search when what is meant is a search with no spatial restriction.

comment:5 follow-up: ↓ 6 Changed 12 years ago by lawrence

Well, I can fix my code so it doesn't call for global if it doesn't need to, but that's no guarantee that anyone else wont. I suspect it would be better for your code to check the bounds, and if it's global not to do the rdb search. Is that possible? Easy? Feasible?

comment:6 in reply to: ↑ 5 Changed 12 years ago by mpritcha

Replying to lawrence:

Well, I can fix my code so it doesn't call for global if it doesn't need to, but that's no guarantee that anyone else wont. I suspect it would be better for your code to check the bounds, and if it's global not to do the rdb search. Is that possible? Easy? Feasible?

OK, have now done this. Pls give it a try. Might need to think if this will cause other problems however.

On re-reding my code after Fabio's comment, I reminded myself that it does actually retrieve the whole result set (only as IDs) before doing the intersection, so should be immune to that problem (but we just have to keep an eye on performance).

comment:7 Changed 12 years ago by selatham

  • Status changed from new to closed
  • Resolution set to fixed

Seems to be getting UKHO recs OK. Plus, as far as I can tell, global spatial coords are not affecting the results. i.e. datasets with no spatial coords appear if they contain the search text string. Therefore closing.

Note: See TracTickets for help on using tickets.