Ticket #801 (closed defect: invalid)

Opened 12 years ago

Last modified 12 years ago

[WG] There is a problem handling html markup in dgMetadatadescription

Reported by: lawrence Owned by: lawrence
Priority: required Milestone: PROD Final
Component: community Version:
Keywords: Cc: rkl, ko23

Description

Not sure whether this is me or the fact that the block should have been marked up as text/html. Anyway, see

 http://localhost:8080/view/badc.nerc.ac.uk__NDG-B1__dataent_bolton?format=xml

Change History

comment:1 Changed 12 years ago by lawrence

  • Status changed from new to assigned
  • Cc rkl, ko23 added

Well, there are two problems here, the first is that the document is not valid xml with xhtml in this element with the current schema, regardless of the moles content type, the second is that the content isn't xhtml (it's html). The latter we can fix at the BADC, the former requires a moles change.

I propose that we add a new type:

<xs:complexType name="dgTextAndHTML" mixed="true">
		<xs:sequence>
			<xs:any namespace="http://www.w3.org/1999/xhtml" minOccurs="0" maxOccurs="unbounded" processContents="lax"/>
		</xs:sequence>
	</xs:complexType>

and modify the dgDescriptionText from

<xs:element name="dgDescriptionText"/>

to

<xs:element name="dgDescriptionText" type="moles:dgTextAndHTML"/>

This will be backwards compatible with any existing 1.3 instances, and will support what our metadata editors want ...

I propose to add these changes in a week unless there are any objections or alternative suggestions.

}}}

comment:2 Changed 12 years ago by rkl

Hi Bryan,

I seem to remember we had a lot of debate about this a while back and the conclusion was to prohibit XHTML markup within MOLES plaintext and provide access to marked up text through an associated URL to the document in a document server. Our systems are built on this assumption and actually strip the markup out during MOLES creation.

I think the reason was that there was concern that although the MOLES schema could be adapted to allow markup in fields such as the abstract, these might cause problems down the line when the abstract was transferred from the intermediate metadata into a document conforming to another schema such as DIF or ISO19139. Has anything happened to invalidate this argument?

comment:3 Changed 12 years ago by lawrence

Ah yes, now I remember, but meanwhile:

1) Our staff have been merrily adding html into the descriptions anyway ... 2) The schema allows alternative content types, which seem to apply to the element content can be something other than plain text, but the schema doesn't actually allow that. 3) Hauling in content from a document server isn't obviously supported in what's there either: although there is the option to put an online reference in here, it's not obvious to me exactly what choices would imply embed heuristics to me (and the browse code)

The abstract is a special case which we could handle differently, the other text bits don't get exported anyway, and we could worry about the ISO19139 connotations when we look at the profiles (I think they'll have to handle xhtml in blocks of text).

comment:4 Changed 12 years ago by rkl

I still have a very uneasy feeling that this is a kludge that might come back to haunt us at some stage. To me the whole idea of an intermediate metadata schema is that its bits should get munged and exported in many different ways and therefore internal markup can be viewed as an accident waiting to happen.

However, providing there is no reason why it should be impossible to strip out mark-up dynamically on export should a scenario require it then I guess I could live with it.

comment:5 Changed 12 years ago by lawrence

  • Status changed from assigned to closed
  • Resolution set to invalid

Ok, I think this is a done deal!

Note: See TracTickets for help on using tickets.