[Bioperl-l] TIGR xml parser(s)

Thu Jan 8 15:37:01 EST 2004

I asked TIGR about that awhile ago and those are in-progress files,
whereas the one tigr.pm parser handles what those will become when they
are released. O. stativa is a good example of this, avilible in both [1]
which is parsable with tigr.pm and in [2] which has the coordset files.
How is the memory usage in XML::SAX? If its not bad maybe I'll redo
tigr.pm with that. I'd love not to be using regex to parse the thing.

[1] ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/
[2] ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/BAC_PAC_clnes/

On Thu 01/08/04 14:47, Jason Stajich wrote:
> Josh -
> 
> I have a Bio::SeqIO  parser for what I think to be the newer TIGR XML.
> 
> Like you see in (which doesn't have a DTD last time checked):
> ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/p_yoelii/annotation_dbs/PYA1.coordset
> ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/p_falciparum/annotation_dbs/PFA1_chromo_1.coordset
> 
> Which is different from
> ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_1.0/chr01.dir/chr01.xml
> which is what your parser reads.
> 
> I don't have any insight if this is an in-progress format, something to do
> manatee or what, will have to try and get clarification from someone at
> TIGR.
> 
> At any rate, I'm not sure what to call it in comparison to your parser
> (Bio::SeqIO::tigr) for now it is called SeqIO::tigrxml.
> 
> I've been using it for a little tigrxml2gff script for loading data into
> Gbrowse.  It also calculates and adds 5' and 3' UTR features based on the
> CDS and mRNA annotations.
> 
> I wrote it with XML::SAX and found that to be quite easy to use and has a
> nice way of using different parsers underneath so a slow pure-perl
> implementation by default but XML::LibXML and others can be plugged in the
> back automagically for speed.
> 
> Anyways - I'll commit it soon - if you have suggestions about naming or if
> we want to try and find a way to combine the two modules into one where
> you can just switch by a 'version number' could try and work that as well.
> 
> 
> -jason
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 

----------------------------
| Josh Lauricha            |
| laurichj at bioinfo.ucr.edu |
| Bioinformatics, UCR      |
|--------------------------|