Ticket #708 (closed task: fixed)

Opened 12 years ago

Last modified 12 years ago

[M] Java bug, MDIP content?

Reported by: lawrence Owned by: lawrence
Priority: critical Milestone: MDIP Portal
Component: discovery Version:
Keywords: Cc:

Description

hi Matt

This may well be one of those cases where I'm happy to get an error from you, and trap it, but can we somehow return in the error message WHICH of the files caused the trouble? That way I could repeat the request without the offending record.

Error [Error retrieving [[
u'dassh.ac.uk__MDIP__MRMLN00400000035.xml', u'dassh.ac.uk__MDIP__MRMLN0040000003E.xml',
u'dassh.ac.uk__MDIP__MRMLN00400000040.xml',
u'dassh.ac.uk__MDIP__MRMLN00400000041.xml',
u'dassh.ac.uk__MDIP__MRMLN00400000034.xml']] was 
[Error retrieving document : java.lang.NullPointerException]]

{'wsgi.multiprocess': False, 'SERVER_SOFTWARE': 'Apache/2.0.46 (Red Hat)', 'SCRIPT_NAME': '/discovery', 'SERVER_SIGNATURE': '<address>Apache/2.0.46 (Red Hat) Server at glue.badc.rl.ac.uk Port 80</address>\n', 'REQUEST_METHOD': 'GET', 'PATH_INFO': '', 'SERVER_PROTOCOL': 'HTTP/1.1', 'QUERY_STRING': 'textTarget=All&searchString=sewerage&advanced=0', 'PATH': '/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin', 'HTTP_ACCEPT_CHARSET': 'utf-8, utf-8;q=0.5, *;q=0.5', 'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.2 (like Gecko) Kubuntu 6.06 Dapper', 'HTTP_CONNECTION': 'Keep-Alive', 'SERVER_NAME': 'glue.badc.rl.ac.uk', 'REMOTE_ADDR': '130.246.120.163', 'paste.parsed_querystring': ([('textTarget', 'All'), ('searchString', 'sewerage'), ('advanced', '0')], 'textTarget=All&searchString=sewerage&advanced=0'), 'wsgi.url_scheme': 'http', 'PATH_TRANSLATED': '/var/www/fastcgi/ndg.fcgi/discovery', 'SERVER_PORT': '80', 'SERVER_ADDR': '130.246.191.172', 'DOCUMENT_ROOT': !
'/var/www/html', 'PYTHONPATH': '/var/www/cgi-bin', 'SCRIPT_FILENAME': '/var/www/fastcgi/ndg.fcgi', 'SERVER_ADMIN': 'badc@rl.ac.uk', 'wsgi.input': <flup.server.fcgi_base.InputStream object at 0xb6ee422c>, 'HTTP_HOST': 'glue.badc.rl.ac.uk', 'wsgi.multithread': True, 'REQUEST_URI': '/discovery?textTarget=All&searchString=sewerage&advanced=0', 'HTTP_ACCEPT': 'text/html, image/jpeg, image/png, text/*, image/*, */*', 'wsgi.version': (1, 0), 'GATEWAY_INTERFACE': 'CGI/1.1', 'wsgi.run_once': False, 'wsgi.errors': <flup.server.fcgi_base.TeeOutputStream object at 0xb6f5c14c>, 'REMOTE_PORT': '35326', 'HTTP_ACCEPT_LANGUAGE': 'en', 'HTTP_ACCEPT_ENCODING': 'x-gzip, x-deflate, gzip, deflate', 'UNIQUE_ID': 'MKo9fH8AAAEAACZ6cvwAAAAA'}

Change History

comment:1 Changed 12 years ago by lawrence

Ooops, the offending command was  this one

comment:2 Changed 12 years ago by selatham

Further to that, the first offending record, dassh.ac.ukMDIPMRMLN00400000035.xml, is a record which was 'mal-ingested' to the system. Some other dassh MDIP format records are OK, e.g.  http://glue.badc.rl.ac.uk/discovery?textTarget=All&searchString=dard&advanced=0 .

So I suspect the mal-ingest, which is a strange one. Usually records either fail or ingest OK. I am looking into it. Does Matt need to do some error trapping anyway?

comment:3 Changed 12 years ago by selatham

The problem in d2boneoff has gone to ticket #709.

However this flags up the problem whereby an original record may get into exist, but the d2b fails. We will have an original record but no corresponding moles. Can we cope with this by returning those records we can get at initial discovery search.

The alternative is having a db with exact matching discovery and moles. This may be difficult. We would have to be able to 'roll-back' the db to it's previous state. exist does not offer db roll-back. The other alternative is manual checking of logs and a db dump for potential restore for each DP ingest.

comment:4 Changed 12 years ago by lawrence

I don't understand this. Surely the work flow should be something like

  1. oai record cleaned up (now called original record)
  2. original record processed to moles record
  3. both records put in the DB.

If you need to put the record in the DB to run dboneoff, you at least have the option to delete it back out again afterwards, what does roll back have to do with it?

comment:5 Changed 12 years ago by mpritcha

  • Owner changed from mpritcha to lawrence

Updated PresentAgent? so now fails gracefully if individual documents fail. However portal still fails on sewerage example, above. Bryan: can your code handle an empty <document> element returned from present ...this may be the problem now.

Example return:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
	<soapenv:Header/>
	<soapenv:Body>
		<doPresentReturn xmlns="urn:DiscoveryServiceAPI">
			<status>true</status>
			<statusMessage>Success but some failed documents : dassh.ac.uk__MDIP__MRMLN00400000041.xml</statusMessage>
			<documents>
				<document>&lt;DIF (snipped)/DIF></document>
				<document/>
			</documents>
		</doPresentReturn>
	</soapenv:Body>
</soapenv:Envelope>

comment:6 Changed 12 years ago by lawrence

  • Status changed from new to closed
  • Resolution set to fixed

OK, I think we handle this error gracefully enough for now. Test it and see how you like it ... now the error goes back to the data provider I think.

Note: See TracTickets for help on using tickets.