[Biopython-dev] Code review request for phyloxml branch

Michiel de Hoon mjldehoon at yahoo.com
Fri Jan 8 16:26:29 UTC 2010


I am not an expert in this area, but the code looks very well done and well organized. Thanks, Eric!

I have one suggestion though:
In the current layout, there's a Bio.Tree and a Bio.TreeIO module. I'd rather have everything under Bio.Tree. This makes it easier to understand what each Bio.* module is about, and also agrees with the structure of the other modules in Biopython. The only exception is Bio.Seq, for which there is a closely related Bio.SeqIO and Bio.SeqRecord. (In my opinion, that is more for historical reasons; I'd rather have a single Bio.Seq there too).

Thanks again,

--Michiel.

--- On Mon, 12/28/09, Eric Talevich <eric.talevich at gmail.com> wrote:

> From: Eric Talevich <eric.talevich at gmail.com>
> Subject: Re: [Biopython-dev] Code review request for phyloxml branch
> To: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Monday, December 28, 2009, 8:51 PM
> Hi folks,
> 
> Here's an update on the status of Bio.Tree and TreeIO. I
> think I've taken
> care of most of the blockers since the last review in
> September.
> 
> First, some links:
> http://github.com/etal/biopython/tree/phyloxml/Bio/Tree/
> http://github.com/etal/biopython/tree/phyloxml/Bio/TreeIO/
> http://github.com/etal/biopython/tree/phyloxml/Tests/test_PhyloXML.py
> http://github.com/etal/biopython/tree/phyloxml/Tests/test_Tree.py
> http://biopython.org/wiki/PhyloXML
> 
> Discussion:
> 
> *TreeIO*
> Conversion between Nexus, Newick and phyloXML tree file
> formats works; the
> read/parse/write functions for each IO format use the same
> object types.
> Neat!
> 
> The tree annotations (e.g. id) aren't preserved perfectly
> during conversions
> -- I'll keep working on this, but I don't think it's a
> blocker. The taxon
> names of terminal nodes are kept as "clade" names in
> phyloXML for
> round-tripping. Tree topology and branch lengths seem OK.
> 
> Under the hood:
> -- PhyloXMLIO is from GSoC
> -- NewickIO is ported from the Bio.Nexus.Trees parser. I
> think it works the
> same way.
> -- NexusIO relies on Bio.Nexus.Nexus for parsing, then
> converts the
> resulting Nexus.Trees.Tree objects to Bio.Tree.Newick
> objects. One day, when
> Nexus.Trees is replaced by NewickIO in the main Nexus
> parser, then this
> conversion can be dropped and NexusIO will be very simple.
> 
> *Tree*
> The BaseTree object structure looks like this:*
> 
> -- BaseTree.**Tree* contains global tree information, like
> whether the tree
> is rooted, and a reference to the root clade. The phyloXML
> Phylogeny object
> inherits from this.*
> 
> -- BaseTree.**Subtree* contains local (clade- or
> node-specific) information,
> and references to each of its direct descendents,
> recursively. The phyloXML
> Clade object inherits from this. Nodes are implicit. I
> could add references
> to the ancestor of each sub-tree without too much
> difficulty, but I haven't
> needed them yet.
> 
> The same methods (get_terminals et al.) generally apply to
> both classes, so
> I created a separate TreeMixin class from which both
> BaseTree.Tree and
> BaseTree.Subtree inherit.
> 
> Bio.Tree.Newick contains simple subclasses of Tree and
> Subtree, and an
> incomplete set of shims that track Bio.Nexus.Trees.Tree
> (minus the I/O).
> This is to ease the deprecation and eventual replacement of
> Bio.Nexus.Trees,
> as I imagine it:
> (1) Port methods from Nexus.Trees to Bio.Tree, simplifying
> arguments where
> reasonable (since the node IDs and adjacency list lookup
> are no longer
> needed)
> (2) Implement methods in Bio.Tree.Newick with the original
> argument lists,
> but triggering a deprecation warning indicating the newer
> replacement method
> (3) Replace Nexus.Trees with an import of
> Bio.Tree.Newick(IO) and a few more
> shims to duplicate the original API -- so test_Nexus.py
> should still pass,
> ideally (with deprecation warnings)
> (4) In Nexus.Nexus, replace all usage of Nexus.Trees with
> proper usage of
> NexusIO and Bio.Tree methods.
> (5) Eventually delete Nexus.Trees and the shims in
> Bio.Tree.Newick.
> 
> I'm currently doing (1) and (2), with more emphasis on
> getting (1) right.
> Not all of the important methods have been ported, but I'm
> happy with the
> tree traversal methods.
> *
> Tests
> *I created test_Tree.py to test the methods in
> Bio.Tree.BaseTree;
> test_PhyloXML.py tests Bio.Tree.PhyloXML objects and
> Bio.TreeIO.PhyloXMLIO
> parsing/writing.
> 
> I noticed that in Tests/Nexus/, the example file for
> internal node labels is
> actually in Newick/NH format, not Nexus. That was briefly
> confusing, so
> maybe that file should be renamed.
> 
> What do you think?
> 
> All the best,
> Eric
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


      



More information about the Biopython-dev mailing list