Ticket #1029 (closed task: fixed)

Opened 11 years ago

Last modified 10 years ago

(DI-2-7) Better handle ranking within resultset

Reported by: sdonegan Owned by: sdonegan
Priority: critical Milestone: NDG3
Component: discovery Version:
Keywords: discovery ranking Cc:

Description

Need to augment current ranking mechanisms and allow user greater control over these (at initial search stage??) and produce results ordered as required in the resultset from the discoveryBE.

Change History

comment:1 Changed 11 years ago by sdonegan

  • Status changed from new to assigned

comment:2 Changed 10 years ago by sdonegan

Ordering and ranking within the resultset on the backend is now implemented. Search results can now be ordered by: Proximity, Dataset date, dataset last update date, dataset name, datacentre name, dataset resultset popularity and dataset linked popularity. Extra columns have been implemented within the database to record everytime a dataset is present within a resultset (dataset resultset popularity ) and everytime a link associated with a dataset is followed (dataset linked popularity). Currently code is in place within the backend to update the table every time a particular dataset is within a resultset but development still required for the URL tracker to update the linked popularity column. Update required on front end to include drop down lists to allow user to actually order the results.

comment:3 Changed 10 years ago by sdonegan

  • Priority changed from required to critical

A problem has been spotted when ordering results in which results with null in the relevant ranking metric are placed at the top of the resultset when placed in descending order. This must be changed so that these *least* matching terms are placed at the end of the ordering sequence irrespective of ordering sequence. This obfuscates the results and gives the impression the ordering has not worked.

comment:4 Changed 10 years ago by mpritcha

Order by problem : notes

-- Make a dummy table testorder:

-- Table: testorder

-- DROP TABLE testorder;

CREATE TABLE testorder
(
  id serial NOT NULL,
  date timestamp without time zone,
  thisint integer,
  CONSTRAINT testorder_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;

Contents : (i.e. some columns left null)

select * from testorder
iddatethisint
1"2007-01-01 12:13:00"0
2"2008-01-01 12:13:00"1
3""2
4"2007-02-01 10:53:00"
select * from testorder order by date desc
3""2
2"2008-01-01 12:13:00"1
4"2007-02-01 10:53:00"
1"2007-01-01 12:13:00"0

note records with null date appear at top of list.

Idea from  http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=52643

select * from testorder
order by (case WHEN thisint IS NULL THEN 1 ELSE 0 END), thisint desc

seems to work, by adding another column as a proxy. In this case 1=high value to use when null, 0 = low value when not null. Need to use appropriate values for relevant column's data type.

3""2
2"2008-01-01 12:13:00"1
1"2007-01-01 12:13:00"0
4"2007-02-01 10:53:00"
select * from testorder
order by (case WHEN date IS NULL THEN '2010-01-01 00:00:01' ELSE '1940-01-01 00:00:01' END), date desc
2"2008-01-01 12:13:00"1
4"2007-02-01 10:53:00"
1"2007-01-01 12:13:00"0
3""2

comment:5 Changed 10 years ago by sdonegan

  • Status changed from assigned to closed
  • Resolution set to fixed

Adjusted ordering sql to include the above on all date ranking metrics (now startdate, enddate, metadata update date and discovery ingest date). All likely resultset null handling for other metrics updated too.

Note: See TracTickets for help on using tickets.