[Biopython] Sequence alignment with multiple proteins

Cymon Cox cy at cymon.org
Thu May 14 13:29:47 EDT 2009


Hi Michael,

2009/5/14 Fahy, Michael <fahy at chapman.edu>

> This is not strictly a BioPython question but I'm using BioPython for
> the work.
>
> I have a set of 45 proteins and 10 species.  I have a  representative
> orthologous protein from each set for each of the 10 species.  I'm
> trying to build a phylogenetic tree by aligning the data from the 10
> species.  I've tried concatenating the 45 protein sequences for each of
> the 10 species and aligning the concatenated sequences but this has
> produced results that do not make sense.  What do you recommend for such
> a problem?


The way I (and I suspect most others) approach this is to align each protein
data individually (ie you'll have 45 separate protein alignments) and then
concatenated them into one super-matrix.

Currently, Bio.AlignIO does not support column to column concatenation of
data. But by happy coincidence, David Winter, posted today that he has
included a cookbook example of how to combine alignments using the Bio.Nexus
interface - you can find the example here:
http://biopython.org/wiki/Concatenate_nexus

If you alignment viewer does not support export in Nexus format, you can use
Bio.AlignIO to convert the alignment to Nexus.

Cheers, Cymon
--


More information about the Biopython mailing list