[Biopython-dev] Project ideas for GSoC (or other student projects)

Eric Talevich eric.talevich at gmail.com
Tue Feb 12 20:00:11 UTC 2013


On Tue, Feb 12, 2013 at 12:51 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hello all,
>
> Google recently confirmed they will be running Google Summer of Code 2013,
> and we (Biopython and the other Bio* projects) would hope to be accepted
> again
> under the Open Bioinformatics Foundation as in previous years:
> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>
> It would be great to start coming up with potential project ideas, both
> larger
> pieces of work suitable for GSoC but also smaller tasks for other project
> students, or 'low hanging fruit' for potential contributors to cut
> their teeth on.
>

One interesting GSoC project would be to implement support for phylogenetic
placements. The programs pplacer and EPA (part of RAxML) can place sequence
reads from metagenomic samples onto a reference phylogeny:
http://matsen.fhcrc.org/pplacer/
http://sysbio.oxfordjournals.org/content/60/3/291

The output format of those programs has been standardized as something I
suppose we could call the "jplace" format:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0031009
http://arxiv.org/abs/1201.3397

It's based on JSON and Newick, with a small extension to Newick that
shouldn't be too hard to support. The GSoC project would be to implement a
parser for this and implement querying as well as integration with the rest
of Bio.Phylo to some reasonable extent. I would be available to mentor this.

In terms of low-hanging fruit, there are some small but important functions
that could be added to Bio.Phylo. My top three: Robinson-Foulds distance,
majority-rules consensus, draw an unrooted tree using Felsenstein's Equal
Daylight algorithm (which starts by computing the layout for a radial tree).

-Eric



More information about the Biopython-dev mailing list