[Biopython-dev] Phylogeny modules for BioPython

Jacob Porter jacobporter2002 at yahoo.com
Tue Apr 7 22:27:21 EDT 2009


Hi all,

My name is Jacob Porter, and I am a graduate student in the math department at UC Davis.  I've done work before on phylogeny inference using so-called "phylogenetic invariants" that can be found at the website: http://www.shsu.edu/~ldg005/small-trees/

It appears to me that BioPython doesn't have much support for phylogeny inference and tools related to phylogeny inference.

I have applied to the Google Summer of Code (12 weeks of working part-time on a programming assignment), and I am looking for a project that could work with BioPython as I see a lot of potential in it.  I can bring my expertise on phylogeny inference to this project to add some support for this.

I need three things from the community ASAP:

1) Ideas as to which of my several project ideas are the most useful to the BioPython community
2) Information as to what is already included in BioPython concerning phylogeny inference and related tools
3) A mentor that will help me with the project (and possibly work in conjunction with Nascent (https://www.nescent.org/wg_phyloinformatics/Main_Pagementors)  I would need a 12 -week schedule of tasks for the project (TBD), and answers to questions related to developing for BioPython.  (I've worked with Python a lot before, so I shouldn't need much help with Python so much as I need help with BioPython).

Project 1:
Add support for popular phylogeny representation standards such as DND files.  Give the ability to read and write such files.  Convert between such files.  I need help in picking which standards to use and need help in picking which operations on these files is the most useful.

Project 2:
Add wrappers for modern (hopefully high throughput and accurate) phylogeny inference software written in C++/C.  Examples of such software include neighbor-joining, MJOIN software (similar to neighbor-joining) (http://bio.math.berkeley.edu/mjoin/), Garli (http://www.molecularevolution.org/si/software/garli/), treeSVD (http://www.stat.uchicago.edu/~eriksson/software.html), and maximum parsimony.  I would like to know which sort of phylogeny inference software is the most useful in your opinion.  I assume no wrappers for such software exist.

Project 3:
Add analytic algorithms that use phylogeny in some way.  Examples include bootstrapping and protein-protein interaction inference algorithms.  (i.e. "Inferring protein interactions from phylogenetic distance matrices" by Gertz et al.)  I need information as to what sort of algorithms would be useful.

Project 4:
Enhance phylogeny inference software further.  MJOIN has bugs (I think it returns negative distances in some cases, and some modifications to it that I developed using phylogenetic invariants are seg-faulting).


Not all of these ideas will probably be able to be developed, so I need information as to what might be the most useful.  I was thinking of focusing on Project 1 and Project 2 for the initial phase.

Any information will be appreciated, and any mentorship will be great.  I would like a response quickly, so that I can inform Nascent of my plans.

Thanks,
Jacob Porter
UC Davis



      



More information about the Biopython-dev mailing list