[Bioperl-l] TIGR xml parser(s)

Thu Jan 8 14:47:25 EST 2004

Josh -

I have a Bio::SeqIO  parser for what I think to be the newer TIGR XML.

Like you see in (which doesn't have a DTD last time checked):
ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/p_yoelii/annotation_dbs/PYA1.coordset
ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/p_falciparum/annotation_dbs/PFA1_chromo_1.coordset

Which is different from
ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_1.0/chr01.dir/chr01.xml
which is what your parser reads.

I don't have any insight if this is an in-progress format, something to do
manatee or what, will have to try and get clarification from someone at
TIGR.

At any rate, I'm not sure what to call it in comparison to your parser
(Bio::SeqIO::tigr) for now it is called SeqIO::tigrxml.

I've been using it for a little tigrxml2gff script for loading data into
Gbrowse.  It also calculates and adds 5' and 3' UTR features based on the
CDS and mRNA annotations.

I wrote it with XML::SAX and found that to be quite easy to use and has a
nice way of using different parsers underneath so a slow pure-perl
implementation by default but XML::LibXML and others can be plugged in the
back automagically for speed.

Anyways - I'll commit it soon - if you have suggestions about naming or if
we want to try and find a way to combine the two modules into one where
you can just switch by a 'version number' could try and work that as well.

-jason

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu