[Bioperl-l] Re: [SO-devel] GFF3 preliminary

Ewan Birney birney at ebi.ac.uk
Wed Feb 19 06:42:18 EST 2003



On Tue, 18 Feb 2003, Mark Yandell wrote:

> Hi All,
>
>
> ".  When asked why they
> > have modified the published Sanger specification, bioinformaticists
> > frequently answer that the format was insufficient for their needs...",
>
>
> So why not just use XML? you know, with like a real DTD, like the rest of the
> world and be done with it ?
>

that's what NCBI Seq XML or GAME XML or (new and shiny...talk to Michele)
Otter XML is for, and they solve specific problems.


With XML you can't:

  use grep
  use sort and sort -k and other twisted options of sort
  use comm
  use awk

With XML you need

  a decent XML SAX parser in your language of choice to read it reliably -
now this is pretty much there for most languages


  enough coding time to write a SAX event to internal data structure
in a tag-tolerant way (after all, if you are going to be strict on the
tags and not tolerate additional tags... then why use XML?). Nowhere near
impossible, but nowhere near as simple as @fields = split;


  endless discussions with people who are trying to solve related but
distinct problems to discover that you want to write separate XML formats.



XML is a bad format, but undoubtly the best format out there for complex
data.


XML simply doesn't replace tab delimited formats and we shouldn't mandate
the death of GFF and friends (eg, GTF) due to XML formats being used for
complex data transfer.










More information about the Bioperl-l mailing list