Ticket #722 (closed task: fixed)

Opened 12 years ago

Last modified 12 years ago

[WG] Unicode error (again) in ndg discovery

Reported by: lawrence Owned by: lawrence
Priority: required Milestone: Replace Metadata Gateway
Component: community Version:
Keywords: Cc:


Problem arises from

  • ndg.noc.soton.ac.ukDIFNOCSDAT162
  • ndg.noc.soton.ac.ukDIFNOCSDAT160

Nothing wrong with the metadata ( e.g.), unless the encoding is wrong (need to check that, but in any case, should handle nicely).

Error ['ascii' codec can't decode byte 0xef in position 401: ordinal not in range(128)]
'HTTP_REFERER': 'http://glue.badc.rl.ac.uk/discovery?textTarget=All&searchString=ndg.noc.soton.ac.uk__DIF__NOCSDAT162&advanced=0', 'SERVER_SOFTWARE': 'Apache/2.0.46(Red Hat)', 
'SCRIPT_NAME': '/retrieve', 
'SERVER_SIGNATURE': '<address>Apache/2.0.46 (Red Hat) Server at glue.badc.rl.ac.uk Port 80</address>\n', 
'QUERY_STRING': 'repository=ndg&uri=ndg.noc.soton.ac.uk__DIF__NOCSDAT162&format=DIF&type=html', 'PATH': '/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin', 'HTTP_ACCEPT_CHARSET': 'utf-8, utf-8;q=0.5, *;q=0.5', 
'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.2 (like Gecko) Kubuntu 6.06 Dapper', 'HTTP_CONNECTION': 'Keep-Alive', 'SERVER_NAME': 'glue.badc.rl.ac.uk', 'REMOTE_ADDR': '', 'paste.parsed_querystring': ([('repository', 'ndg'), ('uri', 'ndg.noc.soton.ac.uk__DIF__NOCSDAT162'), ('format', 'DIF'), ('type', 'html')], 
'wsgi.url_scheme': 'http', 'PATH_TRANSLATED': '/var/www/fastcgi/ndg.fcgi/retrieve', 
'SERVER_PORT': '80', 'SERVER_ADDR': '', 
'DOCUMENT_ROOT': '/var/www/html', 'PYTHONPATH': '/var/www/cgi-bin', 
'SCRIPT_FILENAME': '/var/www/fastcgi/ndg.fcgi', 'SERVER_ADMIN': 'badc@rl.ac.uk', 
'wsgi.input': <flup.server.fcgi_base.InputStream object at 0xb6f4ec0c>, 
'HTTP_HOST': 'glue.badc.rl.ac.uk', 'wsgi.multithread': True, 
'REQUEST_URI': '/retrieve?repository=ndg&uri=ndg.noc.soton.ac.uk__DIF__NOCSDAT162&format=DIF&type=html', 'HTTP_ACCEPT': 'text/html, image/jpeg, image/png, text/*, image/*, */*', 
'wsgi.version': (1, 0), 'GATEWAY_INTERFACE': 'CGI/1.1', 'wsgi.run_once': False, 
'wsgi.errors': <flup.server.fcgi_base.TeeOutputStream object at 0xb6f4eb8c>, 
'x-gzip, x-deflate, gzip, deflate', 'UNIQUE_ID': 'BpLHfX8AAAEAAEyIcw8AAAAE'}

Two lines of enquiry: fix it in DIF.py (line 76, but nsdumb should do it right anyway), or fix it in render (line 61), but why then just the abstract? Is it the way I build this particular string?

Change History

comment:1 Changed 12 years ago by lawrence

ok, it's clear that in

		<DIV id="">
		<span class="title">%s</span>
		<DIV class="abstract">%s </DIV>

body is coming back as unicode and that's the problem, even though the nasty characters in the abstract. Making either the abstract or the body = fixes the problem. Don't have time now, but this should now be straight forward

comment:2 Changed 12 years ago by lawrence

  • Status changed from new to closed
  • Resolution set to fixed

Fixed in changeset:2433.

Note: See TracTickets for help on using tickets.