Ticket #438 (closed issue: fixed)

Opened 13 years ago

Last modified 12 years ago

[M] What do we do about duplicates in our oai harvested repository?

Reported by: selatham Owned by: rkl
Priority: required Milestone: System Integration
Component: discovery Version:
Keywords: WS-Discovery2 MDIP ProjectBoard Cc:

Description

Could be due to same datasets being described in DIF and ISO19115. Could just be duplicates anyway from OAI providers.

Change History

comment:1 Changed 13 years ago by selatham

  • Status changed from new to assigned

I've made the assumption in the new harvesting automation that we only harvest one format from a DataProvider. This is noted in the DataProvider's config file. We could allow multiple format harvesting as long as they are not the same datasets. This would complicate harvesting,ingestion and possibly Discovery WS - so prefer to keep it simple.

comment:2 Changed 13 years ago by selatham

  • Status changed from assigned to new
  • Owner changed from selatham to lawrence
  • Keywords WS-Discovery2 MDIP ProjectBoard added

I don't think there is anything we can do about DataProvider giving us logical duplicates - apart from advice on not republishing other peoples records.

Need a ProjectBoard? or PI decision whether this, plus last comment, is sufficient to close this issue.

comment:3 Changed 12 years ago by selatham

  • Owner changed from lawrence to rkl

Leaving this open until the MDIP/BODC datasets logical separation #612 sorted. Then will be happy we can avoid duplicates.

comment:4 Changed 12 years ago by selatham

  • Status changed from new to closed
  • Resolution set to fixed

BODC are supplying DIF records which have both MDIP and NERC_DDC keywords. They could supply other formats/ keyword combinations in future - we should be able to handle these. Advic is to keep it simple however.

Note: See TracTickets for help on using tickets.