[Bioperl-l] Fitch's Parsimony Algorithm with Perl

Gundala Viswanath gundalav at gmail.com
Wed Sep 3 14:48:07 UTC 2008


Hi,

What's a correct way to implement Fitch's parsimony algorithm?
Especially to compute minimum substitiution rate per column
in the aligned sequence.

Is there a Bioperl module to do it?

For example
CGGCGGAAAACTGTCCTCCGTGC   mouse
CGACGGAACATTCTCCTCCGCGC   rat
CGACGGAATATTCCCCTCCGTGC   human
CGACGGAAGACTCTCCTCCGTGC   chimp

00100000302011000000100   -> number of subst per site (max parsimony)

My code below doesn't seem to do the job.

__BEGIN__
use Data::Dumper;
use List::MoreUtils qw(uniq);

# The related phylogenetic in Newick format tree is:
my $tree = ' (mouse,rat,(human,chimp))';

my $sites = [
    'CGGCGGAAAACTGTCCTCCGTGC', # mouse
    'CGACGGAACATTCTCCTCCGCGC', # rat
    'CGACGGAATATTCCCCTCCGTGC', # human
    'CGACGGAAGACTCTCCTCCGTGC', # chimp
];

my @val = my_parsimony($sites);
print Dumper \@val;

sub my_parsimony {

    my  $tfbs   = shift;
    my $mlen = length($tfbs->[0]);

    my $sum_min = 0;
    my @mincol;

    foreach my $pos ( 0 .. $mlen-1 ) {

        my @colbp = ();
        foreach my $site ( @{$tfbs} ) {
            my $bp = substr($site,$pos,1);
            push @colbp, $bp;

        }

       # this heuristic seems to be faulty
       # Column 11 it predicts 1 instead of 2

       # Not sure how can I make use of the tree

        my $min_mm = scalar( uniq(@colbp) ) - 1;
        push @mincol, $min_mm;
    }

    return @mincol;
}

__END__

- Gundala Viswanath
Jakarta - Indonesia



More information about the Bioperl-l mailing list