[Biopython] Sequence alignment with multiple proteins
David Winter
winda002 at student.otago.ac.nz
Thu May 14 18:52:47 EDT 2009
Fahy, Michael wrote:
> This is not strictly a BioPython question but I'm using BioPython for
> the work.
>
>
>
> I have a set of 45 proteins and 10 species. I have a representative
> orthologous protein from each set for each of the 10 species. I'm
> trying to build a phylogenetic tree by aligning the data from the 10
> species. I've tried concatenating the 45 protein sequences for each of
> the 10 species and aligning the concatenated sequences but this has
> produced results that do not make sense. What do you recommend for such
> a problem?
>
Hi Michael,
As you've heard the usual approach is to align the sequences
individually first then make a supermatrix.
Without knowing the details of the analysis you want to do I'd imagine
with that many sequences for each taxon you're likely to have some
protein-trees behaving differently than others (which might explain your
unexpected results). There are ways of dealing with this (depending on
your dedication to getting The One True Tree) like "gene jackknifing"
(taking a set of protein's out and seeing how they effect the topology)
and partion based tests. Sadly these are frequently run on super
computers...
Cheers,
david
More information about the Biopython
mailing list