[Bioperl-l] SeqIO out of memory

Jason Stajich jason at cgt.mc.duke.edu
Mon Mar 3 11:08:14 EST 2003


This is bug #1371

-j
On Mon, 3 Mar 2003, Brian Osborne wrote:

> Ewan and Hilmar,
> >> Aha. This a better explanation. ;)
>
> Not so fast there my friends, I now have a _real_ bug!    ;-)
> Check this one out:
> ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> Bio::SeqIO->new(-file=>"aap-1.gb",-
> format => "genbank" ); open MYOUT,">aap-1.fa"; while ( $seq =
> $in->next_seq ){ p
> rint MYOUT ">",$seq->accession_number,"\n",$seq->seq,"\n"; }'
>
> -------------------- WARNING ---------------------
> MSG: exception while parsing location line [bond(174,175)] in reading
> EMBL/GenBa
> nk/SwissProt, ignoring feature misc_feature (seqid=aap-1):
> ------------- EXCEPTION  -------------
> MSG: operator "bond" unrecognized by parser
> STACK Bio::Factory::FTLocationFactory::from_string
> /usr/lib/perl5/site_perl/5.8.
> 0/Bio/Factory/FTLocationFactory.pm:160
> STACK (eval) /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:124
> STACK Bio::SeqIO::FTHelper::_generic_seqfeature
> /usr/lib/perl5/site_perl/5.8.0/B
> io/SeqIO/FTHelper.pm:123
> STACK Bio::SeqIO::genbank::next_seq
> /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/gen
> bank.pm:394
> STACK toplevel -e:1
>
> So Factory/FTLocationFactory only wants to see the operators "complement",
> "join", or "order" in Genbank files, but I have a Genbank file with "bond"
> in that same position (NM_059121, C. elegans aap-1):
>
> ~/data/refseq>grep bond aap-1.gb
>      misc_feature    bond(174,175)
>      misc_feature    bond(434,435)
>      misc_feature    bond(700,701)
>      misc_feature    bond(1132,1133)
>
> It's easy to change line 160 in FTLocationFactory from
> "} elsif(($op eq "join") || ($op eq "order") ) {"
> to
> "} elsif(($op eq "join") || ($op eq "order") ) || ($op eq "bond")) {
> and now the file is parsed without complaint, but does SplitLocation
> correctly handle "(<num>,<num>)" in addition to
> "(<num>..<num>,<num>..<num>)"?
>
> Brian O.
>
>
>
> >       -hilmar
> >
> > On Friday, February 28, 2003, at 01:06  PM, Brian Osborne wrote:
> >
> > > Bioperl-l,
> > > Check out this one-liner, where the input file is rscu.gbff, a
> > > Genbank-formatted file with 111,220 entries. The fasta file that's made
> > > contains only 42,451 entries. Is "Out of memory" the expected result
> > > for an
> > > input file this size?
> > >
> > > ~/data/refseq>perl -e 'use Bio::SeqIO; $in =
> > > Bio::SeqIO->new(-file=>"rscu.gbff",
> > > -format=>"genbank"); open MYOUT,">rscu.fa"; while ( $seq =
> > > in->next_seq ){ print
> > >  MYOUT ">" . $seq->accession_number . "\n" . $seq->seq . "\n"; }'
> > >
> > > Out of memory during "large" request for 33558528 bytes, total sbrk()
> > > is
> > > 3822837
> > > 76 bytes at /usr/lib/perl5/site_perl/5.8.0/Bio/Seq/RichSeq.pm line 114,
> > > <GEN0> l
> > > ine 6433958.
> > >
> > > Brian O.
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> > >
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list