[Biopython-dev] [Wg-phyloinformatics] BioGeography update

Tue Jul 7 15:12:02 UTC 2009

Stephen [I think] wrote:
>> > I can see however
>> > where the Bio.Nexus functionality might not be sufficient for tree
>> > manipulation. I am not a contributor to the BioPython dev group so I
>> > cannot speak to those specifics, but as a user I can see separating
>> > out the tree functions from the Nexus package (and tree I/O in
>> > general) as logically a phylogenetic tree structure has little to do
>> > with the nexus file format. It can be somewhat awkward to deal with in
>> > the current form. A more general implementation might be a Bio.Tree
>> > package with I/O readers in Nexus and Newick and XML, etc.

Brad wrote:
>> Definitely. Eric has been discussing this with regards to the
>> PhyloXML project and we had been looking at other Tree
>> representations: in PyCogent and Thomas Mailund's Newick module.
>> Considering the lagrange tree model makes a lot of sense as well.
>> What I'd like to see is a stab at a generalized Tree object that
>> supports the operations you need and that the Bio.Nexus parser can
>> produce, exactly as you describe. Eric and Nick, what do you think
>> about coordinating on this?

Eric worte:
> Sounds great to me.

I also agree. Bio.Nexus has some good stuff that is a bit hidden, and has
wider application - some kind of Bio.Tree module sounds sensible (ideally
with I/O for Nexus, XML, etc). We might even move the phyloXML specific
stuff to live under Bio.Tree.PhyloXML.

> My impression is that most tree representations are based on a recursive
> Node element with a few associated attributes and a number of useful
> methods; phyloXML has a Clade object roughly corresponding to that,
> but also a bunch of other element types for extensive annotation of
> the tree. So two options spring to mind:
>
> 1. Let the Bio.PhyloXML.Tree objects be a superset of everything needed by
> any phylogenetic tree representation, ever. (It's already pretty close.)
> Refactor Nexus and Newick to use these objects; merge the features of
> lagrange so the rest of the Biopython environment can benefit. Only export
> to external object structures that are something other than a straight
> phylogenetic tree -- e.g. networkx or graphviz for plotting, numpy/scipy for
> crunching.
>
> 2. Factor a simple tree structure out of lagrange and Bio.Nexus, and let
> that be the Biopython default representation. Add a function in Bio.PhyloXML
> to export its enhanced tree structure to this simpler Bio.Tree
> representation.

I am unclear why would you need to have to have an entirely separate tree
object structure (which then requires code to map between the two).
Perhaps some specific examples of the "enhancements" would help?

How about this variation on (2):
Suppose Bio.Tree provided a simple tree object (holding a nested structure),
with methods/functions for general operations like DFT, finding common
ancestors, calculating branch lengths, collapsing internal nodes, etc.
[and I would expect a lot of this could be borrowed from Bio.Nexus,
and/or Thomas Mailund's Newick module]. Couldn't Bio.PhyloXML build
on this using subclassed tree nodes?

Do we even need different objects? What if each node class had an optional
python dictionary for annotations? You could maybe key this off the PhyloXML
names?

> I wrote Bio.PhyloXML.Tree to use the naming conventions of phyloXML, but
> otherwise be independent of that specific file format. It doesn't depend on
> any XML library directly, and both child nodes and XML node attributes
> appear as plain ol' object attributes in the tree. But the Nexus module
> looked like the parser was kind of tied to the tree representation, so I
> haven't reused any of that code yet. So #1 is my preference, but it put the
> burden of inter-module compatibility on whoever is maintaining Bio.Nexus,
> whereas #2 leaves my code on a quiet little island for the rest of the
> summer.

We're going to need some input from the Bio.Nexus authors - Frank and
Cymon (CC'd).

Peter