[Biopython] Sequence alignment with multiple proteins

Peter biopython at maubp.freeserve.co.uk
Thu May 14 13:24:11 EDT 2009


On Thu, May 14, 2009 at 6:07 PM, Fahy, Michael <fahy at chapman.edu> wrote:
> This is not strictly a BioPython question but I'm using BioPython for
> the work.
>
> I have a set of 45 proteins and 10 species.  I have a  representative
> orthologous protein from each set for each of the 10 species.  I'm
> trying to build a phylogenetic tree by aligning the data from the 10
> species.  I've tried concatenating the 45 protein sequences for each of
> the 10 species and aligning the concatenated sequences but this has
> produced results that do not make sense.  What do you recommend for such
> a problem?

Concatenating the sequences and then aligning them sounds like
asking for trouble.

I would suggest taking each gene in isolation, and making a protein
sequence alignment.  Then take the 45 alignments and concatenate
them into one super-alignment [*].  Then make a tree.

There are things you should assess - for example do trees from each
of the separate 45 protein alignments, and compare them - you may
find some of the genes are evolving at different rates etc. Maybe only
some of the 45 proteins are suitable.  Perhaps looking at the
nucleotides would also be wise.  I'm sure an expert in phylogenetics
(i.e. not me) could give much more advice.

Peter

[*] This can be done in Biopython, but isn't that straight forward at
the moment, see this thread:
http://lists.open-bio.org/pipermail/biopython-dev/2009-May/006044.html
http://lists.open-bio.org/pipermail/biopython-dev/2009-May/006046.html



More information about the Biopython mailing list