[BioPython] PHYLIP

Mon Jul 16 09:14:54 UTC 2007

Michael Fahy wrote:
> I've just started using BioPython and have worked through the Cookbook
> section on calling clustalw from Python to do alignments.  It would be cool
> to use clustalw to produce PHYLIP-format files and then call the PHYLIP
> programs to produce phylogenetic trees from them.  Has anyone already worked
> this out?  I searched the last couple of years of list archives and did not
> find anything about using BioPython to access PHYLIP.

You should be able to do this:

1. Produce your unaligned sequences in a suitable format for clustalw 
e.g. write a fasta file using Bio.SeqIO.write(...)

2. Run clustalw (e.g. using the Bio.Clustalw command line wrapper in 
Biopthon, or just make a system call in python).

3. Read in the clustal format alignment using Bio.SeqIO.parse(...) and
write it out unaltered using Bio.SeqIO.write(...) in phylip format. See
http://biopython.org/wiki/SeqIO#File_Format_Conversion

4. Run the PHYLIP tools (e.g. by making a system call from python, or by 
hand at the command line). Personally I like the EMBOSS implementation 
of PHYLIP as this uses proper command line arguments - making calling 
them from code much easier.

(Note they do like to re-arrange their website, and as EMBOSS 5,0 is 
just out, it looks like some links are broken right now).

Note that you should avoid long record id's as the phylip format imposes 
strict truncation of 10 characters (which could lead to non-unique 
record names).

Peter