[Bioperl-l] Seeking comments, Bio::Species memleak fix
Chris Fields
cjfields at illinois.edu
Sat Jul 3 23:36:15 UTC 2010
On Jul 3, 2010, at 3:55 PM, Hilmar Lapp wrote:
> On Jun 30, 2010, at 1:35 PM, Chris Fields wrote:
>
>> Bio::Species had previously inherited from Bio::Taxon, but also required
>> it to hold a Bio::Tree::Tree which contained a circular reference back
>> to the Bio::Species object, thus requiring Scalar::Util::weaken. This
>> has caused several hard-to-diagnose problems with premature garbage
>> collection, including some issues with threads (bug 3017).
>
>
> Is there something we should learn for future cases in which circular references are being considered with the S::U::weaken remedy (such as the recent one re: SeqI<->SeqFeatureI)?
>
> -hilmar
In my experience, using weaken() is a bit of a 'code smell', and in general there are nice, sane ways to work around it. The solutions I outlined for the SeqI<->SeqFeatureI suggestion I thought made sense, if you think of Bio::SeqI as really more like a Bio::SeqRecord (has-a PrimarySeqI, does hold features, annotations, etc). There are no circular refs in this case:
SeqI has-a SeqFeature(s)I, via FeatureHolderI
SeqI has-a PrimarySeqI
SeqFeatureI has-a PrimarySeqI
Nothing refers back to SeqI
It's a nice design decision and a great delegation of duties: SeqFeatures shouldn't need the Seq(Record) as a whole but the just the sequence itself, present as the PrimarySeqI, therefore they are attached to that. Any sequence-specific changes (methods, attributes) should therefore be in PrimarySeqI, and if needed could be delegated to for convenience from a relevant SeqI.
In this particular instance (Bio::Species), there appeared to be a need to store a ref to the Tree (containing self) in self to implement the prior Bio::Species methods, thus requiring the use of weaken(). To me this was an indication that we really needed to rethink the design of the class. Maybe the containing instance (eg: Bio::Seq in this case) needs to hold the classification (as a Tree) and cache the relevant species NodeI (a Bio::Taxon).
As we're still at the stage where we are using a Bio::Species, I pushed that functionality for the time being into the Bio::Species class: it has-a Bio::NodeI and a Bio::Tree::Tree, but neither refers back to the Bio::Species. We're planning on deprecating Bio::Species in future releases anyway, and reimplementing so that Bio::Seq is storing the Tree and the Bio::Taxon species node should be easy when the time comes.
Actually, this probably suggests abstracting the simplest parts of that out into a common interface, which could be switched over to Bio::Seq when needed. May work on that a bit...
chris
More information about the Bioperl-l
mailing list