[Bioperl-l] Error parsing TIGR xml

Fernan Aguero fernan at iib.unsam.edu.ar
Tue Jun 22 13:21:22 EDT 2004


Hi!

I'm seeing an error while trying to parse a .coordset file
from TIGR. It is my first attempt at using this kind of
files, so perhaps I'm doing something wrong.

Here's my brief script:

#!/usr/bin/perl -w

use strict;
use Bio::SeqIO;

my $seqio = Bio::SeqIO->new( -file => $ARGV[0], -format => 'tigr');

Just trying to create a SeqIO object produces the following error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: [2]Required <ASMBL_ID> missing
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.6.1/Bio/Root/Root.pm:328
STACK: Bio::SeqIO::tigr::throw /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:1338
STACK: Bio::SeqIO::tigr::_process_assembly /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:522
STACK: Bio::SeqIO::tigr::_process /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:423
STACK: Bio::SeqIO::tigr::_initialize /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/tigr.pm:90
STACK: Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:358
STACK: Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO.pm:378
STACK: ./tigrxml2features.pl:6
-----------------------------------------------------------


The file does contain ASMBL_IDs, or at least that is what I
believe. These are the first lines of the file

<ASSEMBLY ASMBL_ID = "56" COORDS = "1-2149">
        <HEADER>
                <CLONE_NAME>1047053397923</CLONE_NAME>
                <ORGANISM>Trypanosoma cruzi</ORGANISM>
                <AUTHOR_LIST CONTACT = "">
                </AUTHOR_LIST>
        </HEADER>
        <TU FEAT_NAME = "56.t00001" LOCUS = "Tc00.1047053397923.10" PUB_LOCUS = 
"" ALT_LOCUS = "" COM_NAME = "hypothetical protein" PUB_COMMENT = "" COORDS = "1
67-586">
                <MODEL FEAT_NAME = "56.m00001" COMMENT = "" COORDS = "167-586">
                        <PROTEIN_SEQ>MKQSSTDGGGKQKGKDSVSSDSMKDAVTDNPGKPTATTIPTSR
SGDAQEKEGKDDGTDERPTSKKHNSSPETGNTNDALTASENTPQTAETTATTVAKKNDTTIGDSDGSTAVSDTASPLLLL
FLVVVACAAAAAVVAA*</PROTEIN_SEQ>
                        <EXON FEAT_NAME = "56.e00001" COORDS = "167-586">
                                <CDS FEAT_NAME = "56.c00001" COORDS = "167-586"/
>
                        </EXON>
                </MODEL>
        </TU>
</ASSEMBLY>

I've found a mention of a tigrxml by Jason Stajich that
was supposed to be different from the SeqIO::tigr by Josh
Lauricha. But I don't seem to have it in my system
(bioperl-1.4)
<http://bioperl.org/pipermail/bioperl-l/2004-January/014491.html>

Thanks in advance,

Fernan 

PS: I'm CCing the author of the tigr.pm module, just in
case. 

-- 
F e r n a n   A g u e r o
http://genoma.unsam.edu.ar/~fernan


More information about the Bioperl-l mailing list