[Bioperl-l] xml standard for sequences

Matthew Pocock matthew_pocock@yahoo.co.uk
Tue, 16 Apr 2002 19:20:35 +0100


Paul Gordon wrote:

> it could be a
> discussion and DTD (much needed!) repository for the community, instead of
> trying to come up with a common syntax.

Couldn't agree more. By far the most usefull thing we could do is 
generate xml-schema, rdf/daml or (heaven forbid) DTD fragments for 
common non-contentious biological concepts (like standardising strand or 
frame) and let people compose their documents from these common 
elements/attributes plus their own glue. The next generation of xml 
parsers and DOMs will be able to expose the grammer validating elements 
and attributes, so you will be able to do some really funkey binding of 
xml data to factories, objects and code. Small, non-contentious 
definitions would be a good place to start.

Schema has the benefit that we can define data-types and validation 
rules without naming elements or attributes, and in some cases without 
distinguishing between rules for validating attributes and text content 
of elements. This allows a concept like strand to be used in the way an 
author feels most comfortable with. They could say <strand>+</strand> or 
  <feature str="+"/> or <hit strand1="+" strand2="+"/> and have all the 
apropreate text validated against the same schema definition. This sort 
of thing can't be done with DTDs. This isn't rocket science and it 
doesn't force everyone into adopting a single world view. It just might 
work. I'm all for letting everyone agree to disagree, and if a computer 
can translate between these different formalizms well then great.

Matthew