[Bioperl-l] Parse problem of a big EMBL entry

Wed Apr 29 05:10:27 UTC 2009

Brian -

Without memory leaks it should only take up as much memory as the  
current sequence you have parsed.  If you mean you have a sequence  
record with > 1M lines I'm not sure how much memory that would take  
up, depends on if this is lots of feature or what.  There are ways to  
tell BioPerl to throw away things you don't want to parse out from the  
record. See http://bioperl.org/wiki/HOWTO:SeqIO#Speed. 
2C_Bio::Seq::SeqBuilder

Perl will use as much memory as is available on your machine. Have you  
monitored the memory use of the perl running to insure it is reaching  
the 32Gb limit and that is in fact what is killing the program?

-jason
On Apr 28, 2009, at 8:14 PM, brian li wrote:

> Hi everyone,
>
>     Here is greeting from Brian.
>
>     I have just began to use bioperl 1.6.0 to collect certain data
> lines from EMBL files.
>
>     There's a problem when I try to get an entry that includes over 1
> million lines. A call of Bio::SeqIO::embl->next_seq would just cause
> the parser script to exit. I have read Bio/SeqIO/embl.pm and I think
> one possible way to solve the problem may be to give my script more
> memory to store the entry data. The machine I am using has 32GB
> memory, and that shall be enough for any entry.
>
>     So I am wondering whether there is any way to set the size of the
> memory available to a perl script. Others ways to deal with the
> problem are also welcome.
>
>     Appreciate your help.
>
> Brian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason at bioperl.org