[Biopython-dev] Project ideas for GSoC (or other student projects)

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 21 17:01:51 UTC 2013

On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> I like Michiel's idea, and I'll suggest two more:
> 1. Codon alignment & analysis:

Already up on the wiki :)

> 2. Phylo enhancements:
> 2a. Tree drawing:
> - A proper draw_unrooted function to perform radial layout, with an optional
> "iterations" argument to use Felsenstein's Equal Daylight algorithm -- I
> feel this layout approach is neglected in most libraries.
> - Better matplotlib/pylab integration, so the plot components can be tweaked
> using matplotlib functions.
> - Other common layout approaches, e.g. circular.
> 2b. A "Phylo.consensus" module:
> - strict consensus, like Bio.Nexus already implements.
> - other consensus methods, time permitting.
> 2c. A "Phylo.distance" module:
> - Robinson-Foulds distance -- though others might be working on this
> already.
> 2d. Simple tree inference:
> - Straightforward algorithms exist for neighbor-joining and parsimony tree
> estimation. For small alignments (and perhaps medium-sized ones with PyPy),
> it would be nice to run these without an external program, e.g. to construct
> a guide tree for another algorithm or quickly view a phylogenetic clustering
> of sequences.

One more idea for a sub-task?

2e. Using multiple trees for bootstrapping a master tree. Take the master
tree and for each edge you have a partition of the leaves, which can be
used as a dictionary hash (e.g. as a binary representation). Then for
each of the bootstrap runs, look at each edge, compute the hash for
that split of the leaves, and increment the count. Then at the end, you
have a dictionary of counts which are the branch bootstrap supports.

I wrote that once in Python some time back, and used it to take a set
of boot strap trees generated on a cluster and give the support values
to the master tree.

> Any interest in either of these? Shall I add them to the wiki?

They both seem worth posting on the wiki, although we may not have
enough mentors for both to go ahead :(


More information about the Biopython-dev mailing list