[Bioperl-l] Performance of Bio::Species

Sendu Bala bix at sendu.me.uk
Tue Nov 21 20:16:18 UTC 2006


Hilmar Lapp wrote:
> 
> On Nov 21, 2006, at 2:37 PM, Sendu Bala wrote:
> 
>> Anyway, for the memory leak I have some ideas I haven't tried yet; I
>> don't know if my efforts will solve the speed issue though.
> 
> The memory leak sounds more concerning to me. Under which circumstances 
> would it crash a script or blow throuhg all of say 1-2GB when it should 
> have taken only a tenth of that.

Its been reported as causing problems if you do something like parse a 
large embl file with many (10s of thousands) sequences in it. So 
basically any situation that you make lots of Bio::Species objects.

IIRC the reporter ran out of memory on a ~40000 sequence embl file. 
Neither the memory leak fix or speed fix ought to require any API 
change. I'm fairly certain that the memory leak, at least, is confined 
to a problem with (as suggested) Bio::Tree* stuff failing to clean up on 
destruction.

There was in fact already an unnoticed problem with Bio::Tree::Node not 
getting cleaned up (see my #*** comment in the code), but my 
Bio::Species-related changes exacerbated the problem and also made them 
noticeable, since you're more likely to create thousands of Bio::Species 
than you were Bio::Tree::Node.



More information about the Bioperl-l mailing list