[Biopython] Sequence alignment with multiple proteins

David Winter winda002 at student.otago.ac.nz
Thu May 14 18:52:47 EDT 2009


Fahy, Michael wrote:
> This is not strictly a BioPython question but I'm using BioPython for
> the work.
>
>  
>
> I have a set of 45 proteins and 10 species.  I have a  representative
> orthologous protein from each set for each of the 10 species.  I'm
> trying to build a phylogenetic tree by aligning the data from the 10
> species.  I've tried concatenating the 45 protein sequences for each of
> the 10 species and aligning the concatenated sequences but this has
> produced results that do not make sense.  What do you recommend for such
> a problem?
>   
Hi Michael,

As you've heard the usual approach is to align the sequences 
individually first then make a supermatrix.

Without knowing the details of the analysis you want to do I'd imagine 
with that many sequences for each taxon you're likely to have some 
protein-trees behaving differently than others (which might explain your 
unexpected results). There are ways of dealing with this (depending on 
your dedication to getting The One True Tree) like "gene jackknifing" 
(taking a set of protein's out and seeing how they effect the topology) 
and partion based tests. Sadly these are frequently run on super 
computers...

Cheers,
david


More information about the Biopython mailing list