Ticket #438 (closed issue: fixed)

Opened 15 years ago

Last modified 15 years ago

[M] What do we do about duplicates in our oai harvested repository?

Reported by: selatham Owned by: rkl
Priority: required Milestone: System Integration
Component: discovery Version:
Keywords: WS-Discovery2 MDIP ProjectBoard Cc:


Could be due to same datasets being described in DIF and ISO19115. Could just be duplicates anyway from OAI providers.

Change History

comment:1 Changed 15 years ago by selatham

  • Status changed from new to assigned

I've made the assumption in the new harvesting automation that we only harvest one format from a DataProvider. This is noted in the DataProvider's config file. We could allow multiple format harvesting as long as they are not the same datasets. This would complicate harvesting,ingestion and possibly Discovery WS - so prefer to keep it simple.

comment:2 Changed 15 years ago by selatham

  • Keywords WS-Discovery2 MDIP ProjectBoard added
  • Owner changed from selatham to lawrence
  • Status changed from assigned to new

I don't think there is anything we can do about DataProvider giving us logical duplicates - apart from advice on not republishing other peoples records.

Need a ProjectBoard? or PI decision whether this, plus last comment, is sufficient to close this issue.

comment:3 Changed 15 years ago by selatham

  • Owner changed from lawrence to rkl

Leaving this open until the MDIP/BODC datasets logical separation #612 sorted. Then will be happy we can avoid duplicates.

comment:4 Changed 15 years ago by selatham

  • Status changed from new to closed
  • Resolution set to fixed

BODC are supplying DIF records which have both MDIP and NERC_DDC keywords. They could supply other formats/ keyword combinations in future - we should be able to handle these. Advic is to keep it simple however.

Note: See TracTickets for help on using tickets.