[Biopython] Memory leak while parse gbk file?

Peter Cock p.j.a.cock at googlemail.com
Fri Oct 28 11:52:33 UTC 2011


On Fri, Oct 28, 2011 at 12:46 PM, ning luwen
<bioinformaticsing at gmail.com> wrote:
> Hi,
>    I have tried to parse about 2000+ gbk file using SeqIO.parse to
> parse gbk file, but the memory up quickly. ( in my desktop 4g memory,
> out memory after a number of iterates, and then try one work station,
> memory used as high as 100g+, and continue increasing)
>
> for temp_name in file_names:#file_names: list of path of gbk files.
>    f=open(temp_name)
>    for x in SeqIO.parse(f,'genbank'):
>        print x.name,len(x.features)
>    f.close()
>
>   I guess there may be memory leak while parse gbk flle.
> --
> regards,
> luwen ning

Which version of Python are you using? Try calling garbage collection,

import gc
from Bio import SeqIO
for temp_name in file_names:#file_names: list of path of gbk files.
    f=open(temp_name)
    for x in SeqIO.parse(f,'genbank'):
        print x.name,len(x.features)
    f.close()
    gc.collect()

I expect that to fix the increasing memory usage. If it does, then
it isn't a memory leak.

Peter




More information about the Biopython mailing list