[Bioperl-l] SeqIO out of memory

Ewan Birney birney at ebi.ac.uk
Mon Mar 3 16:20:00 EST 2003


On Mon, 3 Mar 2003, Brian Osborne wrote:

> Ewan and Hilmar,
> >> Aha. This a better explanation. ;)
> 
> Not so fast there my friends, I now have a _real_ bug!    ;-)
> Check this one out:
> ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> Bio::SeqIO->new(-file=>"aap-1.gb",-
> format => "genbank" ); open MYOUT,">aap-1.fa"; while ( $seq =
> $in->next_seq ){ p
> rint MYOUT ">",$seq->accession_number,"\n",$seq->seq,"\n"; }'
> 
> -------------------- WARNING ---------------------
> MSG: exception while parsing location line [bond(174,175)] in reading
> EMBL/GenBa
> nk/SwissProt, ignoring feature misc_feature (seqid=aap-1):
> ------------- EXCEPTION  -------------
> MSG: operator "bond" unrecognized by parser


Well... is this a bug?


http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#operators

bond is not in there. Can you send the entire line?





> STACK Bio::Factory::FTLocationFactory::from_string
> /usr/lib/perl5/site_perl/5.8.
> 0/Bio/Factory/FTLocationFactory.pm:160
> STACK (eval) /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:124
> STACK Bio::SeqIO::FTHelper::_generic_seqfeature
> /usr/lib/perl5/site_perl/5.8.0/B
> io/SeqIO/FTHelper.pm:123
> STACK Bio::SeqIO::genbank::next_seq
> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/gen
> bank.pm:394
> STACK toplevel -e:1
> 
> So Factory/FTLocationFactory only wants to see the operators "complement",
> "join", or "order" in Genbank files, but I have a Genbank file with "bond"
> in that same position (NM_059121, C. elegans aap-1):
> 
> ~/data/refseq>grep bond aap-1.gb
>      misc_feature    bond(174,175)
>      misc_feature    bond(434,435)
>      misc_feature    bond(700,701)
>      misc_feature    bond(1132,1133)
> 
> It's easy to change line 160 in FTLocationFactory from
> "} elsif(($op eq "join") || ($op eq "order") ) {"
> to
> "} elsif(($op eq "join") || ($op eq "order") ) || ($op eq "bond")) {
> and now the file is parsed without complaint, but does SplitLocation
> correctly handle "(<num>,<num>)" in addition to
> "(<num>..<num>,<num>..<num>)"?
> 
> Brian O.
> 
> 
> 
> >       -hilmar
> >
> > On Friday, February 28, 2003, at 01:06  PM, Brian Osborne wrote:
> >
> > > Bioperl-l,
> > > Check out this one-liner, where the input file is rscu.gbff, a
> > > Genbank-formatted file with 111,220 entries. The fasta file that's made
> > > contains only 42,451 entries. Is "Out of memory" the expected result
> > > for an
> > > input file this size?
> > >
> > > ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> > > Bio::SeqIO->new(-file=>"rscu.gbff",
> > > -format=>"genbank"); open MYOUT,">rscu.fa"; while ( $seq =
> > > in->next_seq ){ print
> > >  MYOUT ">" . $seq->accession_number . "\n" . $seq->seq . "\n"; }'
> > >
> > > Out of memory during "large" request for 33558528 bytes, total sbrk()
> > > is
> > > 3822837
> > > 76 bytes at /usr/lib/perl5/site_perl/5.8.0/Bio/Seq/RichSeq.pm line 114,
> > > <GEN0> l
> > > ine 6433958.
> > >
> > > Brian O.
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> > >
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 
> 
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney at ebi.ac.uk>. 
-----------------------------------------------------------------



More information about the Bioperl-l mailing list