[Biopython-dev] Project ideas for GSoC (or other student projects)

Eric Talevich eric.talevich at gmail.com
Wed Mar 13 18:32:25 UTC 2013

On Tue, Feb 12, 2013 at 9:08 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

> It would be great to have better support for microarray analysis in
> Biopython. Something like lumi/limma in R. Perhaps this is an option for
> the GSoC?
> Best,
> -Michiel.

I like Michiel's idea, and I'll suggest two more:

1. Codon alignment & analysis:
- PAL2NAL-style conversion of unaligned nucleic acid sequences and a
protein sequence alignment to a codon alignment. (Previously discussed)
- dN/dS and the related functions needed to calculate it.
- Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage of
codon alignments, including validation (testing for frame shifts etc.)

2. Phylo enhancements:
2a. Tree drawing:
- A proper draw_unrooted function to perform radial layout, with an
optional "iterations" argument to use Felsenstein's Equal Daylight
algorithm -- I feel this layout approach is neglected in most libraries.
- Better matplotlib/pylab integration, so the plot components can be
tweaked using matplotlib functions.
- Other common layout approaches, e.g. circular.
2b. A "Phylo.consensus" module:
- strict consensus, like Bio.Nexus already implements.
- other consensus methods, time permitting.
2c. A "Phylo.distance" module:
- Robinson-Foulds distance -- though others might be working on this
2d. Simple tree inference:
- Straightforward algorithms exist for neighbor-joining and parsimony tree
estimation. For small alignments (and perhaps medium-sized ones with PyPy),
it would be nice to run these without an external program, e.g. to
construct a guide tree for another algorithm or quickly view a phylogenetic
clustering of sequences.

Any interest in either of these? Shall I add them to the wiki?


--- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: [Biopython-dev] Project ideas for GSoC (or other student
> projects)
> > To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Tuesday, February 12, 2013, 12:51 PM
> > Hello all,
> >
> > Google recently confirmed they will be running Google Summer
> > of Code 2013,
> > and we (Biopython and the other Bio* projects) would hope to
> > be accepted again
> > under the Open Bioinformatics Foundation as in previous
> > years:
> > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
> >
> > It would be great to start coming up with potential project
> > ideas, both larger
> > pieces of work suitable for GSoC but also smaller tasks for
> > other project
> > students, or 'low hanging fruit' for potential contributors
> > to cut
> > their teeth on.
> >
> > See also http://biopython.org/wiki/Active_projects
> > and the ideas list there.
> >
> > Regards,
> >
> > Peter

More information about the Biopython-dev mailing list