[Bioperl-l] Translating codons

Karger, Amir AKarger@CuraGen.com
Sun, 24 Jun 2001 17:43:12 -0400


Am I correct in thinking that the default PrimarySeqI::translate method is
pretty slow? It calls translate on each three-letter codon. Why not have
translate take any sequence with length 3n, returning a string of length n?
Just move the for loop inside the subroutine. It seems like it would still
work if you happen to put in a single codon, but this way would work faster
for sequences of, say, thousands of bases.

For example, here's code that translates a protein.
---------------------------
use Benchmark;

my $seq = "actgactgactgactggtgcactacgacta" x 1000;
my $len = length($seq);

%amino = &get_codons;

timethese(50, {
    "substr" => \&do_substr,
    "match" =>  \&do_match,
    "pack" => \&do_pack,
}, "dividing large string with subroutine" );

sub translate {
    my $in = shift;
    $out = $amino{$in};
    return $out;
}

sub do_substr {
    my $protein = "";
    for ($i = 0 ; $i < $len ; $i += 3)  {
        my $codon = substr($seq, $i, 3);
        $protein .= &translate($codon);
    }
    return $protein;
}

[stuff that's the same as do_substr snipped]
sub do_match {
    my @triplet = ($seq =~ /(...)/g); 
}

sub do_pack {
    my @triplet = unpack("A3" x ($len/3), $seq);
}
------------------------

(Out of curiosity, I tried three methods of splitting the string.
Surprisingly, the difference between them seems to be only about 5%. But...)
As you can see, there's a 100% or so speedup, when I changed the code to
just do the $amino{$codon} inside the do_* subs, rather than calling
&translate).

Benchmark: timing 50 iterations of match, pack, substr without sub call.
     match: 8 8.04 0.03 0 0 dividing large string
      pack: 7 7.27 0 0 0 
      substr: 9 8.22 0 0 0 

Benchmark: timing 50 iterations of match, pack, substr with n sub calls
     match: 19 17.82 0.01 0 0 dividing large string with subroutine
      pack: 19 16.96 0.01 0 0 dividing large string with subroutine
    substr: 19 18.96 0 0 0 dividing large string with subroutine


As far as I can tell, this would be a pretty easy change and wouldn't break
anything. (Famous last words.)

Amir Karger
Curagen Corporation