[Bioperl-l] A better fix for Species.pm memory leak, please REVIEW.

George Hartzell hartzell at alerce.com
Sat Nov 22 19:21:19 EST 2008


I dug back into the Species.pm memory leak and my less-than-ideal fix
for it.

First, in my defense, my previous fix only causes failures in
t/Species.t if network tests are enabled.  It passes the generic
ones.  I *thought* I'd run the tests before I committed :).

The basic cause of the problem is that when species() calls
Bio::Tree::Tree->new(-node=> $species_taxon), $species_taxon is a copy
of $self.  The tree ends up adding that node onto the classification
list and it gets linked it into the descendent links, which results in
a cycle.

If that lousy english description left you scratching your head,
here's a hacked-to-pieces version of R Voss's original test case.
You'll need Devel::Cycle (YAY LINCOLN!) to run it.  Find yourself a
genbank file (I used gbpri21.seq, from the original bug report), put
it into a directory somewhere and do something like

  bug.pl GBDIR /tmp

(yeah, why do a while loop w/ an exit in the middle?  Cut-n-paste.
 Why else...?)

################################################################
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

use Devel::Cycle;

my $dir = shift @ARGV; # the directory with *.gz files
my $out = shift @ARGV; # the directory to write to...
mkdir $out if not -d $out; # ...which may need to be created

opendir my $dirhandle, $dir or die $!;
for my $file ( readdir $dirhandle ) {
    next if $file !~ /^gb.*/;

    # object that parses genbank files,
    # returns Bio::Seq objects
    my $reader = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => "${dir}/${file}"
    );

    while ( my $seq = $reader->next_seq ) {
        my $name = $seq->species->binomial;
	find_cycle($seq);
	exit 1;
    }

    # delete the extracted, unfiltered file
    unlink "${dir}/${file}";
}
################################################################

Anyway.  Here's a fix.  Instead of adding $self to the tree, make a
new species node w/ $self's important bits and add that instead.
There's a patch below that gets rid of my weaken and Sendu's
fix-for-the-fix and then Does The Right Thing (I hope).

It seems to pass all of the Species.t tests.

The take home exam questions are:  Are there any other missing
important bits?

Thoughts?

I'll commit it if someone'll review it.

g.

Index: Bio/Species.pm
===================================================================
--- Bio/Species.pm	(revision 15008)
+++ Bio/Species.pm	(working copy)
@@ -261,8 +261,13 @@
 		# work it out from our nodes
 		my $species_taxon = $self->{tree}->find_node(-rank => 'species');
 		unless ($species_taxon) {
-			# just assume we are rank species
-			$species_taxon = $self;
+		  # whip up a new species object so that we don't
+		  # end up with a cycle in the tree.
+		  # initialize it with self's important bits.
+		  # NOTE TO ALL: any missing important bits?
+		  $species_taxon =
+		    Bio::Species->new(-classification =>
+				      [$self->classification]);
 		}
 		
 		$species = $species_taxon->scientific_name;
@@ -278,7 +283,6 @@
             $self->{tree} = Bio::Tree::Tree->new(-node => $species_taxon);
             delete $self->{tree}->{_root_cleanup_methods};
             $root = $self->{tree}->get_root_node;
-            weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
         }
         
 		my @spflds = split(' ', $species);
@@ -395,15 +399,6 @@
         if ($ss_taxon) {
             if ($sub) {
                 $ss_taxon->scientific_name($sub);
-                
-                # *** weakening ref to our root node in species() to solve a
-                # memory leak means that we have a subspecies taxon to set
-                # during the first call to species(), but it has vanished by
-                # the time a user subsequently calls sub_species() to get the
-                # value. So we 'cheat' and just store the subspecies name in
-                # our self hash, instead of the tree. Is this a problem for
-                # a Species object? Can't decide --sendu
-                $self->{'_sub_species'} = $sub;
             }
             return $ss_taxon->scientific_name;
         }


More information about the Bioperl-l mailing list