[Biopython-dev] Bio.Tree layout (Was: BioGeography update)

Mon Jul 13 16:12:06 UTC 2009

Hi folks,

On Fri, Jul 10, 2009 at 8:24 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Fri, Jul 10, 2009 at 1:07 PM, Brad Chapman<chapmanb at 50mail.com> wrote:
> > So, in summary, the major deliverables are:
> >
> > - Generic tree representation plus a TreeIO structure
> > - PhyloXML parser that uses this tree directly
> > - Nexus parser that can handle problem files and parse into the
> >  generic tree. This will let us drop the lagrange duplication from
> >  Nick's code.
> >
> > Sounds like you have this well worked out,
> > Brad
>
> Sounds good. Note PhyloXML (which I gather is annotation rich)
> may not have to use the generic trees, it could use a subclass.
> If this means the generic trees can be less memory hungry that
> might be worth while... something to keep in mind at least. e.g.
> Consider a large Newick file with only taxa names and branch
> lengths, no branch colours, no bootstraps, no internal node
> names, etc.
>
> Peter
>

Hilmar Lapp just pointed me to the BioSQL PhyloDB extension:
http://biosql.org/wiki/Extensions

Should this schema be the basis of a Bio.Tree.BaseTree module?

Here's the file layout I'm picturing:

Bio/Tree/
    BaseTree.py -- everything else derives from these classes
    PhyloXMLTree.py -- already on github
    NexusTree.py -- if necessary

The class structure I'm working on right now looks like:

# In BaseTree -- currently empty classes, pending Nexus integration
class TreeElement(object)
class TreeNode(TreeElement)

# In PhyloXMLTree
class PhyloElement(BaseTree.TreeElement)
class Clade(PhyloElement, BaseTree.TreeNode)
class ...(PhyloElement) -- all other phyloXML classes

Rather than treat BaseTree as the intersection of all the other Tree
representations that rely on it, we could use PhyloDB as the reference
point. What do you think? Should we come back to this in a week or two?

Eric