[Biopython] GSoC Ortholog Module Proposal

Matthew Strand stran104 at chapman.edu
Mon Apr 5 10:59:28 UTC 2010


Dear Biopython GSoC list,

I am a student at Chapman University and over the last 18 months I have been
using biopython to produce phylogenetic trees with ClustalW, T-Coffee, and
PHYLIP. I have found the most difficult part to be identifying ortholgos for
the particular species that our lab is interested in studying. The orthology
databases provide a great deal of matches but each database requires its own
wrapper and some databases are stronger than others with particular species.


So far I have written wrappers to get ortholog IDs from InParanoid and then
fetch the sequences from either NCBI or BioMart. This provides good results
for most common species but not all. To handle rare species I have
implemented the Reverse Smallest Distance orthology algorithm to run
protein-protein searches. It is available at http://ortholog.us. I also have
automated scripts to align protein families, concatenate aligned families,
and create trees.

For GSoC I would like to write a module to abstract finding orthologs as
much as possible. This would greatly simplify creating custom evolutionary
trees for biologists. The module could fetch orthologs from TreeFam,
InParanoid, Harvard's Roundup, and Princeton's BLASTO. The module could also
provide support for producing alignments, concatenating alignments, removing
sections of gaps, and constructing trees. Ortholog identification could be
done with no dependency other than an internet connection. Alignments and
trees would require the user to have the appropriate tools installed.

The overhead of writing this type of code makes it difficult for
evolutionary biologists and bio wet labs to get a picture of evolutionary
relationships in specific groups of species. This module would aim to
simplify creating custom phylogenetic trees.

A timeline of milestones might look something like this:
Week 1-2: Stable wrappers for InParanoid
Week 3-4: Stable wrappers for Roundup
Week 5-6: Stable wrappers for Treefam
Week 6-7: Stable wrappers for BlastO
Week 8-9: Ortholog module to abstract the database wrappers
Week 10-11: Alignment and tree tools

Is there any interest in having such a project? I'd be grateful to get some
feedback either on or off list.

Best,
-Matthew Strand



More information about the Biopython mailing list