[Biopython-dev] Phylogeny modules for BioPython

Peter Cock p.j.a.cock at googlemail.com
Wed Apr 8 08:54:35 UTC 2009


On 4/8/09, Jacob Porter <jacobporter2002 at yahoo.com> wrote:
>
> Hi all,
>
> My name is Jacob Porter, and I am a graduate student in the math
> department at UC Davis.  I've done work before on phylogeny inference
> ...
> It appears to me that BioPython doesn't have much support for
> phylogeny inference and tools related to phylogeny inference.

I'm sure there is room for improvement.

> I have applied to the Google Summer of Code (12 weeks of
> working part-time on a programming assignment), and I am
> looking for a project that could work with BioPython as I see
> a lot of potential in it.  I can bring my expertise on phylogeny
> inference to this project to add some support for this.
>
> I need three things from the community ASAP:
>
> 1) Ideas as to which of my several project ideas are the
> most useful to the BioPython community

Personally, I might pick command line wrappers for existing command
line tools.  However, these don't actually make anything new possible,
as writting your own command line is already fairly easy. This in
itself wouldn't be that much work either.

> 2) Information as to what is already included in BioPython
> concerning phylogeny inference and related tools

Look at Bio.Nexus, plus somewhat related, Bio.AlignIO.

> 3) A mentor that will help me with the project (and
> possibly work in conjunction with Nascent
> (https://www.nescent.org/wg_phyloinformatics/Main_Pagementors)
>  I would need a 12 -week schedule of tasks for the
> project (TBD), and answers to questions related to
> developing for BioPython.  (I've worked with Python
> a lot before, so I shouldn't need much help with
> Python so much as I need help with BioPython).

Brad Chapman may be willing to mentor a GSoC student, have a look back
of the recent email discussions here.  In particular, Nick Matzke has
already expressed some interest in Biogeographical and community
phylogenetics for Biopython (there is a wiki page on open-bio.org on
this).

> Project 1:
> Add support for popular phylogeny representation
> standards such as DND files.  Give the ability to
> read and write such files.  Convert between such
> files.  I need help in picking which standards to use
> and need help in picking which operations on these
> files is the most useful.

We have this already in Bio.Nexus, but there is still room for
improvement - see Bug 2788 for example.

> Project 2:
> Add wrappers for modern (hopefully high throughput
> and accurate) phylogeny inference software written in
> C++/C.  Examples of such software include
> neighbor-joining, MJOIN software (similar to
> neighbor-joining) (http://bio.math.berkeley.edu/mjoin/),
> Garli (http://www.molecularevolution.org/si/software/garli/),
> treeSVD (http://www.stat.uchicago.edu/~eriksson/software.html),
> and maximum parsimony.  I would like to know which
> sort of phylogeny inference software is the most useful
> in your opinion.  I assume no wrappers for such software
> exist.

Well, Bio.Nexus is a great help with certain tools.  There is scope
for adding more command line wrappers though (I like quick-join and
and also quicktree for NJ tree building).

> Project 3:
> Add analytic algorithms that use phylogeny in some
> way.  Examples include bootstrapping and protein-protein
> interaction inference algorithms.  (i.e. "Inferring protein
> interactions from phylogenetic distance matrices" by
> Gertz et al.)  I need information as to what sort of
> algorithms would be useful.

I feel that this is still very much an active area of research, and
there are no clear gold standards.  However, perhaps some published
algorithms may be worth re-implementing in Biopython.  I would still
tend to favour more general work for Biopython that would support
people implementing any/their own algorithm.

> Project 4:
> Enhance phylogeny inference software further.
> MJOIN has bugs (I think it returns negative distances
> in some cases, and some modifications to it that I
> developed using phylogenetic invariants are seg-faulting).

Fixing any bug in MJOIN sounds like a good idea - but doesn't really
affect Biopython directly.

> Not all of these ideas will probably be able to be
> developed, so I need information as to what might
> be the most useful.  I was thinking of focusing on
> Project 1 and Project 2 for the initial phase.
>
> Any information will be appreciated, and any
> mentorship will be great.  I would like a response
> quickly, so that I can inform Nascent of my plans.

Peter.

P.S. Its Biopython, not BioPython



More information about the Biopython-dev mailing list