[Biopython-dev] [Wg-phyloinformatics] BioGeography update

Tue Jul 7 14:25:40 UTC 2009

On Tue, Jul 7, 2009 at 9:02 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

> Hi Stephen;
>
> We can require lagrange to be installed and use imports to
> grab the needed code. The other option is that y'all can explicitly
> relicense a subset of the code under the Biopython license.
>

Trivia: it looks like lagrange in turn depends on scipy, but quickly
glancing through the code, I only see numpy functions being used. Since some
other Biopython modules already depend on numpy, could the installation of
lagrange for Bio.Geography be made simpler by just changing the import to
numpy?

> I can see however
> > where the Bio.Nexus functionality might not be sufficient for tree
> > manipulation. I am not a contributor to the BioPython dev group so I
> > cannot speak to those specifics, but as a user I can see separating
> > out the tree functions from the Nexus package (and tree I/O in
> > general) as logically a phylogenetic tree structure has little to do
> > with the nexus file format. It can be somewhat awkward to deal with in
> > the current form. A more general implementation might be a Bio.Tree
> > package with I/O readers in Nexus and Newick and XML, etc.
>
> Definitely. Eric has been discussing this with regards to the
> PhyloXML project and we had been looking at other Tree
> representations: in PyCogent and Thomas Mailund's Newick module.
> Considering the lagrange tree model makes a lot of sense as well.
> What I'd like to see is a stab at a generalized Tree object that
> supports the operations you need and that the Bio.Nexus parser can
> produce, exactly as you describe. Eric and Nick, what do you think
> about coordinating on this?
>

Sounds great to me. My impression is that most tree representations are
based on a recursive Node element with a few associated attributes and a
number of useful methods; phyloXML has a Clade object roughly corresponding
to that, but also a bunch of other element types for extensive annotation of
the tree. So two options spring to mind:

1. Let the Bio.PhyloXML.Tree objects be a superset of everything needed by
any phylogenetic tree representation, ever. (It's already pretty close.)
Refactor Nexus and Newick to use these objects; merge the features of
lagrange so the rest of the Biopython environment can benefit. Only export
to external object structures that are something other than a straight
phylogenetic tree -- e.g. networkx or graphviz for plotting, numpy/scipy for
crunching.

2. Factor a simple tree structure out of lagrange and Bio.Nexus, and let
that be the Biopython default representation. Add a function in Bio.PhyloXML
to export its enhanced tree structure to this simpler Bio.Tree
representation.

I wrote Bio.PhyloXML.Tree to use the naming conventions of phyloXML, but
otherwise be independent of that specific file format. It doesn't depend on
any XML library directly, and both child nodes and XML node attributes
appear as plain ol' object attributes in the tree. But the Nexus module
looked like the parser was kind of tied to the tree representation, so I
haven't reused any of that code yet. So #1 is my preference, but it put the
burden of inter-module compatibility on whoever is maintaining Bio.Nexus,
whereas #2 leaves my code on a quiet little island for the rest of the
summer.

All the best,
Eric