[Biopython-dev] BioGeography update/BioPython tree module discussion

Eric Talevich eric.talevich at gmail.com
Mon Jul 13 16:01:07 EDT 2009


Hi Nick,

On Mon, Jul 13, 2009 at 2:34 PM, Nick Matzke <matzke at berkeley.edu> wrote:

>
>
> Hi all -- thanks for this discussion about tree classes.  Sorry it took me
> awhile to absorb all of this (and I may still be working on absorbing all of
> it...there is a lot to keep in my head!).
>
[...]

>
> I. Tree Class Options
>
> It sounds like we have 3 options being discussed:
>
> 1. making Bio.PhyloXML.Tree the super-duper tree class
> 2. improving Bio.Nexus.Trees
> 3. including the Lagrange tree class or suitably licensed/inspired version
> thereof.
>
> (Or there is #4, some combination)
>

The last consensus we reached on Biopython-dev was to create two new
modules, Bio.Tree and Bio.TreeIO, like so:

1. Extract a very basic Tree and Node class, looking at the intersection of
the PhyloXML and Nexus class hierarchies, and put the result in
Bio.Tree.BaseTree. I started on this today:
http://github.com/etal/biopython/blob/phyloxml/Bio/Tree/BaseTree.py

(It doesn't do anything yet besides set up a class heirarchy that we can use
for generalizing existing code.)

2. Write wrappers for the existing PhyloXML and Nexus I/O functions. I'm
putting that here:
http://github.com/etal/biopython/blob/phyloxml/Bio/TreeIO/__init__.py

Again, it's only useful for PhyloXML parsing right now. Eventually we can
connect Bio.Nexus to these two modules, but that's well outside the scope of
my GSoC project.



> Bio.PhyloXML.Tree
> =============
> [not sure...perhaps someone could contribute the list of methods/intended
> methods]
> =============
>

Not very many! My project is to implement the phyloXML spec, and the spec
says nothing about methods, just about how to store data. As you've noted,
Bio.Nexus has a lot of useful methods for phylogenetic trees, independent of
the underlying file format. I'd like to separate the I/O code from the tree
representations for Bio.Nexus and Bio.PhyloXML, leaving Bio.TreeIO with
format-specific wrappers, and Bio.Tree, with common tree representations and
methods for handling trees. Basically, I don't want to rewrite necessary
methods from scratch, I want to use the ones Nexus already has.

Since phyloXML is designed to store more kinds of annotations than Nexus,
there are some additional Tree-based classes in Bio.Tree.PhyloXMLTree, with
some methods for dealing with the additional annotations. But the methods
you want will be on Bio.Tree.BaseTree objects, and you shouldn't have to
worry about phyloXML objects unless you want to add some additional
phyloXML-specific annotations to your trees.



> VIII. What I should do next
>
> Given what I now know, I probably should have just written a little
> function to strip node labels out of my Newick trees, and done everything
> based on the Bio.Nexus.Trees class.  I could still do this and continue on
> my merry way without too much trouble.
>
> But given that my tree-based functions should probably be methods of some
> class...here are the questions I have:
>
> * Should I muck with Bio.Nexus.Trees and try to fix the node labels issue?
>  My instinct was not to mess with other people's stuff, but that may be a
> poor instinct...
>
> * Should I implement my tree-based functions methods as methods of the
> Bio.Nexus.Trees class?
>
> * Should I delay on this whole issue while it is being discussed, and go
> back to issues more localized to my GSoC project, i.e. making my GBIF
> functions into methods of a GBIF records class?
>
>
It sounds like relying on the current Bio.Nexus is the best approach. I'll
defer to the experts, but my guess is that if it's only a small change you
need, then make a patch to Bio.Nexus.Trees for your own use and also upload
the patch to Bugzilla to make it easier to use upstream.

Integrating the functions into Bio.Nexus right now probably isn't necessary,
since many of those methods will probably end up in Bio.Tree eventually
anyway. For functions that could become Nexus methods, try arranging the
argument list so that the object the method would belong to comes first.
Then functions can be moved into classes by renaming the first argument to
'self', and nothing breaks. It's also possible to directly monkeypatch a
class/object with functions structured that way, but I think that would be
frowned upon in general...

Cheers,
Eric


More information about the Biopython-dev mailing list