[BioPython] biopython

Michael Fahy fahy at chapman.edu
Sun Jul 29 01:07:29 UTC 2007



Dear Richard,

Thank you for correcting my misuse of
terminology.  

As I understand clusalw, it generates
a
guide tree from a distance matrix calculated from pairwise
alignments.  It then uses this guide tree to do a full multiple
alignment.  If you run clustalw interactively, you can ask it to
generate a phylogenetic tree file from this multiple alignment. 
The
tree file produced in this way differs, naturally enough, from
the guide
tree.  If you run clustalw and pass it command line
arguments it will
automatically  write the guide tree to a file
but I have not be able
to get it to write the other tree to a
file.

I now understand
from your comments that there is
little value in creating this tree file
automatically due to
inaccuracies in the clustalw alignment and other
factors.  I
have read some references that do recommend using
clustalw for
creating multiple alignments (and even for creating
phlyogenetic
trees).  I have also read Edgar's paper in which he
provides
evidence for the superior accuracy of MUSCLE.  Is there
consensus in the research community that , while clustalw was a
useful
program for doing multiple alignments, it has been surpassed
by newer
programs such as MUSCLE and T-Coffee?  If so, it would
be useful to
update BipPython and the BioPython Tutorial and Cookbook
to use these
alternative programs.

And, if you have
created a multiple
alignment and cleaned it (e.g. by removing 
domains with too much
homoplasy) which tool or tools would you use to
create the tree file (or
files) from the alignment?  I
understand that you recommend using
multiple methods (neighbor
joining, parsimony, maximum likelihood, etc.)
and comparing the
results.  I would guess that there are different
tools that are
better suited to each method.  You mention TreeeDyn
and that
looks like a very powerful tool but it appears that it is used
for
editing trees and not for creating tree files from multiple alignment
files.

OK, I just saw the link to your genbank2treedyn
program
on the treedyn site.  It looks like your program will
read a fasta
file with a set of sequences and then use clustalw to do
the multiple
alignment and phylip to create the phylogenetic
tree.  So I guess you
are not opposed to using clustalw, you are
just warning against using its
multiple alignment files to create
trees without analyzing and correcting
them by hand.

Thanks for your help.

--Michael
-- 

> Dear
Michael
> 
> I was hoping to get it to generate a
"real" phylip tree file rather than
> the
>
guide tree that it generates automatically.
> 
> 1/
> There is nothing such as a "phylip" tree
> The
usual tree format for phylip as well as many treeing programs is
>
"newick", in the form
> ((a,b),c)
> This is the
format of the clustal guide tree.
> 
> 2/ you should read
clustal and phylip docs as well as some phylogenetic
> courses
> Making a good phylogenetic tree cannot be automated yet. You have
to
> - check alignement by hand (clustal will align sequences that
should not
> be aligned)
> - exclude domains (positions)
with too much homoplasy, or missing
> positions in some
sequences.
> - several methods should be compared (distances, Ml,
MP, ...) and a
> boostrap run
> 
> Clustal is an
alignement program (you may try Muscle, Lagan, Tcoffe,
> ...) and
not at all a phylogeny program
> 
> Finally, if you make
trees, try : www.treedyn.org ;-)
> 
> best
>
Richard
>



More information about the Biopython mailing list