[Bioperl-l] SeqIO out of memory

Brian Osborne brian_osborne at cognia.com
Mon Mar 3 10:49:16 EST 2003


Ewan and Hilmar,
>> Aha. This a better explanation. ;)

Not so fast there my friends, I now have a _real_ bug!    ;-)
Check this one out:
~/data/refseq>perl -e 'use Bio::SeqIO; $in =
Bio::SeqIO->new(-file=>"aap-1.gb",-
format => "genbank" ); open MYOUT,">aap-1.fa"; while ( $seq =
$in->next_seq ){ p
rint MYOUT ">",$seq->accession_number,"\n",$seq->seq,"\n"; }'

-------------------- WARNING ---------------------
MSG: exception while parsing location line [bond(174,175)] in reading
EMBL/GenBa
nk/SwissProt, ignoring feature misc_feature (seqid=aap-1):
------------- EXCEPTION  -------------
MSG: operator "bond" unrecognized by parser
STACK Bio::Factory::FTLocationFactory::from_string
/usr/lib/perl5/site_perl/5.8.
0/Bio/Factory/FTLocationFactory.pm:160
STACK (eval) /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:124
STACK Bio::SeqIO::FTHelper::_generic_seqfeature
/usr/lib/perl5/site_perl/5.8.0/B
io/SeqIO/FTHelper.pm:123
STACK Bio::SeqIO::genbank::next_seq
/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/gen
bank.pm:394
STACK toplevel -e:1

So Factory/FTLocationFactory only wants to see the operators "complement",
"join", or "order" in Genbank files, but I have a Genbank file with "bond"
in that same position (NM_059121, C. elegans aap-1):

~/data/refseq>grep bond aap-1.gb
     misc_feature    bond(174,175)
     misc_feature    bond(434,435)
     misc_feature    bond(700,701)
     misc_feature    bond(1132,1133)

It's easy to change line 160 in FTLocationFactory from
"} elsif(($op eq "join") || ($op eq "order") ) {"
to
"} elsif(($op eq "join") || ($op eq "order") ) || ($op eq "bond")) {
and now the file is parsed without complaint, but does SplitLocation
correctly handle "(<num>,<num>)" in addition to
"(<num>..<num>,<num>..<num>)"?

Brian O.



>       -hilmar
>
> On Friday, February 28, 2003, at 01:06  PM, Brian Osborne wrote:
>
> > Bioperl-l,
> > Check out this one-liner, where the input file is rscu.gbff, a
> > Genbank-formatted file with 111,220 entries. The fasta file that's made
> > contains only 42,451 entries. Is "Out of memory" the expected result
> > for an
> > input file this size?
> >
> > ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> > Bio::SeqIO->new(-file=>"rscu.gbff",
> > -format=>"genbank"); open MYOUT,">rscu.fa"; while ( $seq =
> > in->next_seq ){ print
> >  MYOUT ">" . $seq->accession_number . "\n" . $seq->seq . "\n"; }'
> >
> > Out of memory during "large" request for 33558528 bytes, total sbrk()
> > is
> > 3822837
> > 76 bytes at /usr/lib/perl5/site_perl/5.8.0/Bio/Seq/RichSeq.pm line 114,
> > <GEN0> l
> > ine 6433958.
> >
> > Brian O.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l at bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list