[Bioperl-l] Interpretation of percentage_idendity

Fri Apr 7 13:50:30 UTC 2006

These methods are really more for multiple sequence alignment than  
pairwise identities.  Although I guess we don't  have anywhere else  
that calculates percent ID for a pair of  sequences in an alignment -  
would be nice for someone to add that.

First off, percentage_identity is an alias for  
average_percentage_identity - this has to do with preserving the  
function names that existed before there where two methods.

There are only really two implementations to concentrate on.

average_percentage_identity
overall_percentage_identity

The documentation for Bio::SimpleAlign gives you some hints about how  
each works

overall is just the overall number of columns that are identical so  
it is very conservative.

Here is the pertinent documentation for average_percent_identity

Function: The function uses a fast method to calculate the average  
percentage identity of the alignment
Notes   : This method implemented by Kevin Howe calculates a figure  
that is
            designed to be similar to the average pairwise identity  
of the
            alignment (identical in the absence of gaps), without  
having to
            explicitly calculate pairwise identities proposed by  
Richard Durbin.
            Validated by Ewan Birney ad Alex Bateman.

If someone wants to put some except of this on the SimpleAlign wiki  
page that would be awesome.

-jason
On Apr 7, 2006, at 10:04 AM, Armin Schmitt wrote:

> Dear Jason,
>
> I need some help with the interpretation of
> the results from all three percentage_identity
> variants offered in the Bioperl module AlignI.pm
>
> - percentage_identity
> - average_percentage_identity
> - overall_percentage_identity
>
> Please understand that I am not a Perl expert,
> so I am not able to get the meaning from the
> source code.
>
> By percentage identity for a 2 sequence alignment
> I undertand the proportion of matching amino acids
> of the total length.
>
> But I suspect that this is different now?
>
> Thank you very much
>
> Armin Schmitt
>
> -- 
> Dr. Armin Schmitt
> Züchtungsbiologie und molekulare Genetik
> Institut für Nutztierwissenschaften
> Humboldt-Universität zu Berlin
> Invalidenstraße 42
> 10115 Berlin
>
> Breeding Biology and Molecular Genetics
> Institute for Animal Sciences
> Humboldt-Universität zu Berlin
> Invalidenstraße 42
> 10115 Berlin
> Germany
>
> Tel:    +49 30 2093 9074
> Fax:	+49 30 2093 6397
> http://www.agrar.hu-berlin.de/nutztier/zb/
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12