[Biopython] Google Summer of Code 2014: Student application

Tue Mar 18 23:30:42 UTC 2014

On Mon, Mar 17, 2014 at 12:09 PM, Lluís Revilla <lluis.revilla at gmail.com>wrote:

> Hi everyone,
>
> I am a Biotechnology student and I want to contribute to Biopython. I have
> read the wiki GSoC page and I found two ideas. But I think I don't have the
> desired skills, I am not much familiarized with the Biopython's existing
> sequence parsing yet ("Indexing & Lazy-loading Sequence Parsers"), or with
> javascript ("Interactive GenomeDiagram Module"). So I am  thinking to make
> a proposal for the Google Summer of Code about a comparing tool.
>
> My idea comes from the following: I have been several time in charge of
> selecting a tool to do a certain process e.g.: A list of predicted genes, a
> list of possible structures, a list of alignments...
>
> But usually in bioinformatics there are many programs to do the same thing,
> usually they use a different algorithm a different training set data
> (prokaryote, eukaryote ), or have different specifications. And they return
> a more or less sophisticated list, in some standard format, FASTA, GFF,
> Genebank...
>
> The problem when starting a project is to select from this different
> programs which one use for the task, e.g.: Which gene predictor is better
> for prokaryote: Glimmer, EasyGene, GeneMarker, Prodigal, AUGUSTUS...? The
> answer will be specific to the project but sometimes its difficult to
> ensure that it is a good selection. (Other times it is good enough to do
> what the majority do.) But does not solve the problem when new algorithms
> appears, or even to compare between different program versions.
>
> To cover this problem I would like to develop for Biopython a module to
> compare between the different programs output to asses which one is better
> for the task.
> Currently I developed a parser for the afford mentioned programs and it
> compares them in a (very) rude way. I would like to develop further and
> release it to the Biopython community.
>
> What are your thoughts about this idea?
> Thanks,
>
> Lluís
>

Hi Lluís,

This is an interesting idea, though a bit broad. You could maybe find some
inspiration or focus by looking at Critical Assessment of Function
Prediction (CAFA):
http://biofunctionprediction.org/

Perhaps Iddo Friedberg or another AFP enthusiast could comment on how this
project could support benchmarking of automated annotations.

On the technical side, I also recommend looking at nestly, a program that
will execute another specific command-line program with a variety of
different parameters and automatically organize, summarize and compare the
outputs.
http://fhcrc.github.io/nestly/

All the best,
Eric