[Bioperl-l] Performance of Bio::Species

Sendu Bala bix at sendu.me.uk
Sat Nov 25 12:47:28 UTC 2006


Jason Stajich wrote:
> Can we just weaken the references with Scalar::Util? This should solve 
> the problem for circular refs.

I don't know about Stefan's problem, but I tried weakening refs - it 
fixed the memory leak I was seeing, but caused other problems.


> Is Scalar::Util part of the core distro in the min perl we are supporting?

Yes.


> I can add this in Bio::Tree::Node and look around to see where else it 
> is a problem.  We just need a simple script to verify it is having an 
> effect (i.e. a bug report with this).

perl -w -MBio::SeqIO -e '$si = new Bio::SeqIO(-file =>
"5UTR.Pln_nr.dat", -format => "embl"); while ($seq = $si->next_seq) {
$seq->id; }'

Where 5UTR.Pln_nr.dat is a large embl file with ~50000 sequences. For me 
this takes ~11mins to parse and ~2GB memory.

Once I weakened refs in all the places I could find in Bio::Tree::Node 
and Bio::Tree::Tree it used a constant 0.3% of memory but still took 
around 11mins. However lots of the tests in the test suite then fail, 
because Nodes are often made purely to add into a Tree, with the 
requirement that the Tree keeps hard refs to them all (else the Tree 
would fall apart).

I think the Tree actually only keeps a ref to its root Node, which means 
Nodes in general must keep hard refs to their Descendants. With that 
constraint, I haven't been able to break the deadlock and get these 
things to clean up.

Hopefully I'm missing something obvious; please look into it.



More information about the Bioperl-l mailing list