[Biopython] Request from help

Peter Cock p.j.a.cock at googlemail.com
Thu Apr 11 17:11:49 UTC 2013


On Thu, Apr 11, 2013 at 5:49 PM, Eric Ma <ericmajinglong at gmail.com> wrote:
> Hello everybody,
>
> I'm new to the mailing list here, though I've been playing with BioPython
> for quite a while.
>
> I'm having some trouble here. I wanted to display a tree of sequences for
> which I had done a multiple sequence alignment. I tried going through the
> pipeline example here (http://biopython.org/wiki/Phylo#Example_pipeline).
> Because I'm still in the testing phase, instead of writing it as a single
> script, I wrote it as a series of scripts that I would execute in order.
>
> The problem I run into is at step 4 in the example, where I "feed the
> alignment to PhyML". My data set is 70 protein sequences, and the trouble I
> run into is that it takes a very, very long time at the "feeding alignment
> to PhyML" step. I tried running the script on my MacBook Pro overnight, and
> even the next morning it was not done. Am I missing something here?
>
> Just to be clear here, aligning the sequences using Muscle was successful,
> and I also managed to output a distance matrix from sample to sample, which
> I used in another downstream pipeline to display the clustering of the
> sequences on a 2D euclidean plane. However, I wanted to have a tree
> representation to validate the clustering results; the trouble is, I can't
> get the _phyml_tree.txt file to be created, which I would then use to draw
> the tree.
>
> Thanks in advance for any help!
>
> Cheers,
> Eric

Hi Eric,

So this part is getting stuck (or taking a very long time):

#Feed the alignment to PhyML using the command line wrapper:
from Bio.Phylo.Applications import PhymlCommandline
cmdline = PhymlCommandline(input='egfr-family.phy', datatype='aa',
model='WAG', alpha='e', bootstrap=100)
out_log, err_log = cmdline()

At that point is the computer active (high CPU load as measured
via the task manager / system monitor / top / etc)?

I would suggest trying PHYML at the command line by hand, first
check the command the Biopython should be running:

print cmdline

That may give you visual progress on screen. My guess is simply
that this is just slow - you are only running 100 bootstraps, but
perhaps each one is taking a while and that adds up.

You said you had 70 protein sequences - how many columns
are there in the alignment? That can also affect run times.

Peter



More information about the Biopython mailing list