Ticket #482 (closed defect: fixed)

Opened 13 years ago

Last modified 12 years ago

[WG] Upgrade BADC production of DIFs

Reported by: selatham Owned by: selatham
Priority: blocker Milestone: PROD
Component: community Version:
Keywords: Cc:

Description (last modified by selatham) (diff)

Make sure can run Kev's new MOLES-->DIF 'bulkdestubb.jar' producer. Do this after Upgrading eXist ticket #481

Change History

comment:1 Changed 13 years ago by selatham

  • Status changed from new to assigned
  • Description modified (diff)

comment:2 Changed 13 years ago by selatham

  • Owner changed from selatham to ko23
  • Status changed from assigned to new

Run latest bulkdestubb.jar over BADC moles records to produce DIFs.

Got a problem where Simplelinks come out with 'URI' or 'Logo' strung into the URI.

	<Related_URL>
		<URL>URIhttp://badc.nerc.ac.uk/data/chablis</URL>
		<Description> - </Description>
	</Related_URL>
<Related_URL>
		<URL>Logohttp://badc.nerc.ac.uk/graphics/logos/nerc-2.gif</URL>
		<Description> - </Description>
</Related_URL>

comment:3 Changed 13 years ago by selatham

  • Priority changed from required to blocker

This is now a blocker as all BADC DIFs are invalid.

comment:4 Changed 12 years ago by selatham

Also, Parameters are just not coming out in DIF. I've changed the 'unknown' terms and vocabs to 'null' in moles as per conversation with Kev. But still not appearing.

					<dgStdParameterMeasured>
						<dgValidTerm>EARTH SCIENCE</dgValidTerm>
						<dgValidTermID>
							<ParentListID>http://vocab.ndg.nerc.ac.uk/term/P111</ParentListID>
							<TermID>GCAT0001</TermID>
						</dgValidTermID>
						<dgValidSubterm>
							<dgValidTerm>EARTHSCIENCE</dgValidTerm>
							<dgValidTermID>
								<ParentListID>http://vocab.ndg.nerc.ac.uk/term/121</ParentListID>
								<TermID>null</TermID>
							</dgValidTermID>
							<dgValidSubterm>
								<dgValidTerm>Atmosphere</dgValidTerm>
								<dgValidTermID>
									<ParentListID>http://vocab.ndg.nerc.ac.uk/term/P131</ParentListID>
									<TermID>null</TermID>
								</dgValidTermID>
								<dgValidSubterm>
									<dgValidTerm>AtmosphericChemistry</dgValidTerm>
									<dgValidTermID>
										<ParentListID>http://vocab.ndg.nerc.ac.uk/term/P141</ParentListID>
										<TermID>null</TermID>
									</dgValidTermID>
									<dgValidSubterm>
										<dgValidTerm>OxygenCompounds</dgValidTerm>
										<dgValidTermID>
											<ParentListID>null</ParentListID>
											<TermID>null</TermID>
										</dgValidTermID>
									</dgValidSubterm>
								</dgValidSubterm>
							</dgValidSubterm>
						</dgValidSubterm>
					</dgStdParameterMeasured

comment:5 Changed 12 years ago by selatham

By the way, the DIFs produced cannot be parsed by exist or elementTree:-

storing document badc.nerc.ac.uk__DIF__dataent_chablis.xml (0 of 1) ...could not parse file /usr/local/WSClients/OAIBatch/data/badc/discovery_corrected/badc.nerc.ac.uk__DIF__dataent_chablis.xml: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
<p><DIF xmlns="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><br/>LINE: </p>
Traceback (most recent call last):
  File "/usr/local/WSClients/OAIBatch/oai_ingest.py", line 238, in ?
    ident=getID(original_filename)
  File "/usr/local/WSClients/OAIBatch/oai_ingest.py", line 40, in getID
    d=DIF(xml)
  File "/usr/local/WSClients/OAIBatch/DIF.py", line 51, in __init__
    raise ValueError,'DIF input cannot be parsed into an ElementTree instance:\n%s'%xml
ValueError: DIF input cannot be parsed into an ElementTree instance:
<DIF xmlns="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...

comment:6 Changed 12 years ago by selatham

  • Cc ko23 added

The parameters are coming out now that I am putting 'list level' in - although the 'detailed variable' is not appearing.

comment:7 Changed 12 years ago by selatham

URLs are still incorrect. Now coming out as:-

<Related_URL>
		<URL/>
		<Description>URL to aid in delivering data. Note that this may point directly to the data or, more likely, point to the web site of the curator.</Description>
	</Related_URL>
	<Related_URL>
		<URL/>
		<Description> - </Description>
	</Related_URL>
	<Related_URL>
		<URL/>
		<Description> - </Description>
	</Related_URL>

comment:8 Changed 12 years ago by selatham

DIFs can be parsed by exist and elementtree now. Therefore they will ingest into NDG discovery now.

But URLs still wrong - see last comment.

comment:9 Changed 12 years ago by selatham

  • Type changed from task to defect

comment:10 Changed 12 years ago by selatham

  • Owner changed from ko23 to selatham
  • Cc ko23 removed
  • Status changed from new to assigned
  • Milestone changed from ReFactored_Discovery_WebServices to PROD

The URL stuff was actually brought up in ticket #356 which I'm re-assigning to Kev.

However, now got a problem with xqueries timing out with the current exist config.

comment:11 Changed 12 years ago by selatham

  • Status changed from assigned to closed
  • Resolution set to fixed

The timeout was deliberate for Front-end issues. Now agreed That the XQuery timesout rather than exist itself. Re-set the config. Bulk generator runs now. (URL stuff gone to Kev #356)

Note: See TracTickets for help on using tickets.