[Bioperl-l] Parse problem of a big EMBL entry
Jason Stajich
jason at bioperl.org
Wed Apr 29 05:10:27 UTC 2009
Brian -
Without memory leaks it should only take up as much memory as the
current sequence you have parsed. If you mean you have a sequence
record with > 1M lines I'm not sure how much memory that would take
up, depends on if this is lots of feature or what. There are ways to
tell BioPerl to throw away things you don't want to parse out from the
record. See http://bioperl.org/wiki/HOWTO:SeqIO#Speed.
2C_Bio::Seq::SeqBuilder
Perl will use as much memory as is available on your machine. Have you
monitored the memory use of the perl running to insure it is reaching
the 32Gb limit and that is in fact what is killing the program?
-jason
On Apr 28, 2009, at 8:14 PM, brian li wrote:
> Hi everyone,
>
> Here is greeting from Brian.
>
> I have just began to use bioperl 1.6.0 to collect certain data
> lines from EMBL files.
>
> There's a problem when I try to get an entry that includes over 1
> million lines. A call of Bio::SeqIO::embl->next_seq would just cause
> the parser script to exit. I have read Bio/SeqIO/embl.pm and I think
> one possible way to solve the problem may be to give my script more
> memory to store the entry data. The machine I am using has 32GB
> memory, and that shall be enough for any entry.
>
> So I am wondering whether there is any way to set the size of the
> memory available to a perl script. Others ways to deal with the
> problem are also welcome.
>
> Appreciate your help.
>
> Brian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Jason Stajich
jason at bioperl.org
More information about the Bioperl-l
mailing list