[Biopython-dev] [Biopython] Update: call for Google Summer of Code project ideas

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 1 07:02:58 EST 2012


On Mon, Feb 27, 2012 at 4:24 PM, Robert Buels <rbuels at gmail.com> wrote:
> Hi all,
>
> As kindly pointed out by Reece Hart, the previous email I sent out calling
> for Google Summer of Code project ideas, had the wrong due date for project
> ideas in it.
>
> I actually want them to all be in place by Friday, March 2, which is this
> coming Friday.
>

See http://lists.open-bio.org/pipermail/biopython/2012-February/007726.html
for the original complete email.

That deadline is upon us (tomorrow), so where are we with GSoC 2012 ideas?
http://biopython.org/wiki/Google_Summer_of_Code

Are any of the areas touched on in the "Biopython 1.60 plans and beyond"
thread suitable?

Python 3?
---------

In terms of 'software engineering' we might be able to put together
something for Python 3 support (there are still some C extensions to
do), but I'm not sure if there is enough work there.

SearchIO?
---------

I'm wondering if a Biopython SearchIO would make a good project,
that I might supervise. This name is obviously based on BioPerl. I
would be aiming for iterator based parser/writer framework (like SeqIO
and AlignIO) for pairwise 'sequence' searches initially, but have also
been thinking about indexing - at least by query, ideally also by match,
to allow random access akin to what Bio.SeqIO.index offers.

In some cases the results would also be pairwise sequence alignments,
in which case some code can be shared/linked with AlignIO. In other
cases all you get is co-ordinates of the query and match plus some
kind of score. Therefore this could include a hierarchical SearchIO
result object structure for minimal matches up to full pairwise alignments.

I'd hope to cover BLAST XML, BLAST tabular, HMMER tabular (not
really sequence vs sequence, but HMM vs sequence), RPS-BLAST
(again not really sequence vs sequence). Perhaps this could also tie
into the Bio.Motif code as well (if we consider things like PSSM vs
sequence in the same framework).

You can already do some of this in Biopython (e.g. BLAST XML
parsing, and there is some HMMER work on branches), but I'm
hoping for a unified API here.

Peter


More information about the Biopython-dev mailing list