[Bioperl-l] A better fix for Species.pm memory leak, please REVIEW.

George Hartzell hartzell at alerce.com
Sun Nov 23 21:31:09 EST 2008


Chris Fields writes:
 > On Nov 23, 2008, at 6:19 AM, Sendu Bala wrote:
 > 
 > > Chris Fields wrote:
 > >> On Nov 22, 2008, at 6:21 PM, George Hartzell wrote:
 > >> I'm not quite sure why we are attaching a Bio::Tree::Tree to the  
 > >> Bio::Species, so I need to go back and review the changes from the  
 > >> 1.5.2 release (I'm sure there is a good reason, just can't recall  
 > >> and haven't had time to look).
 > >
 > > The point of the refactor was that a Species object no longer be a  
 > > dumb (list of) scalar, but actually 'know' what its nodes are, so  
 > > you can do things like compare Species and find the LCA and such.  
 > > You need a Bio::Tree::Tree for that.
 > > ...
 > >> I vote to go ahead and commit this (the solution seems sound), but  
 > >> Sendu is the one who would be best to vet as he refactored this.   
 > >> If there isn't word by the middle of the week you can go ahead and  
 > >> make the commit.
 > >
 > > I doubt I'll have time to really look at it properly in that time- 
 > > frame... my initial reaction is that it is a bit bizarre (we are a  
 > > species object intended to hold information about the species that  
 > > we represent, and in order to do that... we create and store another  
 > > species object in ourselves?!)... but if it works, it works? Is  
 > > there a test in Species.t that does a Species object comparison and  
 > > finds the LCA? Does it trigger the problem, and does it still get  
 > > the correct LCA after this patch?
 > ...
 > > But anyway, thanks George, go ahead and commit if it all seems  
 > > functional.
 > >
 > > If you could summarise your solution in the bug report(s?) as well,  
 > > that would be great.
 > 
 > 
 > Agreed, though I think we need to ensure that memory leaks don't  
 > permeate into Bio::Taxon when we make the switch (it shouldn't, I  
 > think).  BTW, checking Blame/Annotate it looks like some of this stems  
 > from r10974:
 > 
 > http://code.open-bio.org/svnweb/index.cgi/bioperl/revision/?rev=10974
 > 

The change that I proposed wouldn't cause anything to permeate into
Bio::Taxon.

It's really a design problem with Bio::Species.  A Species "has-a"
reference to a tree *and* when the Species creates that tree it asks
the tree constructor to initialize itself using the Species itself
which results in the tree including a reference to that Species
object.  Et voila, circular reference.

It's not helped by the fact that Bio::Species explicitly disconnects
all of the tree cleanup code that was supposed to handle those
circular references.  On the other hand there are a bunch of comments
that suggest that the cleanup code never worked right.

The other places that use the tree seem to only care about it
right-then-and-there and they can get away with weaken-ing the root
and letting it just fall apart.  Bio::Species::subspecies seems to be
the only code that depends on a tree built by some other call
(Bio::Species::species) and it's unhappy when it's assumptions don't
work out.

If you think of the species object as having a tree that describes
it's place in the world then it seems semantically valid to
initialized the tree from a copy of the Species object's self, thereby
avoiding the mess.

Alternate fixes would be Sendu's caching of the info that subspecies
needs or fixing subspecies to build the tree itself.

I'm not in front of that machine at the moment, but I think that there
are a couple of other places in the src tree that use the -node or
-root arg's to Bio::Tree::Tree::new and should probably be checked for
leaks.

I don't really feel hot or bothered about this fix vs. the combination
of weaken and stashing the subspecies.  It depends on what the author
intended for Species and its tree, neither are particularly sexy and
if Species is really on its way out the door it may not matter.....

g.


More information about the Bioperl-l mailing list