[Biopython-dev] gsoc phylo project questions

Eric Talevich eric.talevich at gmail.com
Wed May 1 15:46:43 UTC 2013


On Tue, Apr 30, 2013 at 3:20 AM, Yanbo Ye <yeyanbo289 at gmail.com> wrote:

> Hi Eric,
>
> Again, thanks for your comment. It might be better to discuss here.
> https://github.com/lijax/gsoc/commit/e969c82a5a0aef45bba1277ce01d6dbee03e6a84#commitcomment-3096321
>
> I have changed my proposal and timeline based on your advice. I think I
> was too optimistic that I didn't consider about the compatibility with
> existing code or other potential problem that may exist. After careful
> consideration, I removed one task from the goal list to make the time more
> relaxed, the tree comparison<http://www.biopython.org/wiki/Phylo_cookbook#Comparing_trees>(seems
> I miss understood this). I might be able to complete all of them. But it's
> better to make it as an extra task, to make sure this coding experience is
> not a burden.
>

I agree it's best to commit to a feasible timeline and then reserve a few
"stretch goals". Dropping the tree distance function is fine, as there are
currently some other students who might develop this small module as a
course project, independently of GSoC. In any case that functionality is
independent of the other tasks you've proposed.


> According to your comment:
>
> 1. I didn't know PyCogent and DendroPy. I'll refer to them for useful
> solutions.
> 2. For distance-based tree and consensus tree, I think there is no need
> to use NumPy. And for consensus tree, my original plan is to implement a
> binary class to count the clade with the same leaves for performance. As
> you suggest, I'll implement a class with the same API and improve the
> performance later, so that I can pay more attention to the Strict and Adam
> Consensus algorithms.
>

Sounds good.


> 3. I didn't find the distance matrix method for MSA on Phylo Cookbook
> page, only from existing tree.
>

Ah, I think I misunderstood you earlier. Yes, for the NJ method you'll need
to use a substitution matrix to compute pairwise distances from a multiple
sequence alignment. This shouldn't be too challenging, though you might
find the need to add a new matrix to the Bio.SubsMat module if you want to
let the user choose something other than BLOSUM or PAM.

4. For parsimony tree search, I have already know how several heuristic
> search algorithms work. Do I need to implement them all?
>

No, just choose a well-established one that you feel comfortable
implementing.

5. I'm not clear about the radial layout and Felsenstein's Equal Daylight
> algorithm. Isn't this algorithm one way of showing the radial layout? I'm
> sorry that I'm not familiar with this layout. Can you give some figure
> examples and references?
>

For radial tree layout:
https://en.wikipedia.org/wiki/Radial_tree
http://www.infosun.fim.uni-passau.de/~chris/down/DrawingPhyloTreesEA.pdf

The paper above also explains an "angle spreading" refinement step to
improve the appearance of radial trees, which you could opt to implement
instead of Equal Daylight.

The Equal Daylight algorithm seems to only be documented fully in the book
"Inferring Phylogenies" and implemented in the "drawtree" program in
Phylip. In the Phylip documentation, the radial layout algorithm is called
"Equal Arc", and the layout provided by that algorithm is the starting
point for Equal Daylight:
http://evolution.genetics.washington.edu/phylip/doc/drawtree.html

Cheers,
Eric



More information about the Biopython-dev mailing list