[Bioperl-l] Memory requirements for conversion from embl to genbank

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Thu Aug 31 15:07:14 UTC 2006


I observe the same. Testcase here. Please push it into tescases.
It will be helpful in the future when the parser should cope with the
two /note feature lines.

M.

Sendu Bala wrote:
> Martin MOKREJŠ wrote:
> 
>>Hi,
>>  I use bp_sreformat.pl to convert a file from embl format
>>to genbank. I use current cvs HEAD version and cannot parse
>>two files. Each record is small and I don't understand why
>>is the such a huge memory requirement. The machine has 1GB
>>RAM and running recent recent linux kernel. Moreover, I could
>>parse the same file with bioperl-1.5.1 when I have manually
>>fixed some missing quotes in the file.
> 
> [...]
> 
>>$ bp_sreformat.pl -if embl -of genbank -i 5UTR.Vrl_nr.dat -o 5UTR.Vrl_nr.gb
> 
> 
> The problem occurs simply doing
> $si = new Bio::SeqIO(-format => "embl", -file => "file");
> while ($seq = $si->next_seq) { }
> 
> [...]
> 
>>I am not a perl guru so nor am familiar with bioperl code. Does someone know
>>whether the parsed records are held in the memory or not? It seems so.
>>I guess deleting the objects from memory can be done by dereferencing
>>them after they get written down in the new format immediately. Or, the
>>garbage collector does not work well in perl 5.8.8.
> 
> 
> No, the bp_sreformat.pl code and similar, and perl itself are fine from 
> a memory point of view. The problem is new SeqIO parsing of taxonomic 
> information. Not only is there a big memory leak, I've realised it is 
> also fantastically slow. I'll come up with a fix shortly.
> 
> Sorry,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
Dr. Martin Mokrejs
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
http://www.iresite.org
http://www.iresite.org/~mmokrejs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: two_note_features.embl
Type: chemical/x-embl-dl-nucleotide
Size: 3643 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060831/4686615d/attachment-0004.bin>


More information about the Bioperl-l mailing list