[Biopython-dev] [Wg-phyloinformatics] BioGeography update
chapmanb at 50mail.com
Wed Jul 8 12:36:00 UTC 2009
> I am just now back in town and would love to co-coordinate on this. I
> agree having multiple newick parsers etc. is undesirable, I just found I
> was forced to that this spring when BioPython didn't have what I need
> even for pretty standard Newick files. I have also made use of
> Mailund's newick parser in the past.
That sounds great. Eric is also on board from the PhyloXML side.
For the parser, the right approach is to provide some example files that
Bio.Nexus does not handle correctly, and work on improvements to
that parser to bring it in line with what you need. Secondarily, we
should work on parsing into a general tree structure that supports
the questions you need to ask. This should allow us to avoid the
lagrange code duplication and also have a more robust Nexus parser
> I am booked this afternoon but will go through the thread more this
> evening and comment further. Cheers!
> Eric Talevich wrote:
> > On Tue, Jul 7, 2009 at 9:02 AM, Brad Chapman <chapmanb at 50mail.com
> > <mailto:chapmanb at 50mail.com>> wrote:
> > Hi Stephen;
> > We can require lagrange to be installed and use imports to
> > grab the needed code. The other option is that y'all can explicitly
> > relicense a subset of the code under the Biopython license.
> > Trivia: it looks like lagrange in turn depends on scipy, but quickly
> > glancing through the code, I only see numpy functions being used. Since
> > some other Biopython modules already depend on numpy, could the
> > installation of lagrange for Bio.Geography be made simpler by just
> > changing the import to numpy?
> > > I can see however
> > > where the Bio.Nexus functionality might not be sufficient for tree
> > > manipulation. I am not a contributor to the BioPython dev group so I
> > > cannot speak to those specifics, but as a user I can see separating
> > > out the tree functions from the Nexus package (and tree I/O in
> > > general) as logically a phylogenetic tree structure has little to do
> > > with the nexus file format. It can be somewhat awkward to deal
> > with in
> > > the current form. A more general implementation might be a Bio.Tree
> > > package with I/O readers in Nexus and Newick and XML, etc.
> > Definitely. Eric has been discussing this with regards to the
> > PhyloXML project and we had been looking at other Tree
> > representations: in PyCogent and Thomas Mailund's Newick module.
> > Considering the lagrange tree model makes a lot of sense as well.
> > What I'd like to see is a stab at a generalized Tree object that
> > supports the operations you need and that the Bio.Nexus parser can
> > produce, exactly as you describe. Eric and Nick, what do you think
> > about coordinating on this?
> > Sounds great to me. My impression is that most tree representations are
> > based on a recursive Node element with a few associated attributes and a
> > number of useful methods; phyloXML has a Clade object roughly
> > corresponding to that, but also a bunch of other element types for
> > extensive annotation of the tree. So two options spring to mind:
> > 1. Let the Bio.PhyloXML.Tree objects be a superset of everything needed
> > by any phylogenetic tree representation, ever. (It's already pretty
> > close.) Refactor Nexus and Newick to use these objects; merge the
> > features of lagrange so the rest of the Biopython environment can
> > benefit. Only export to external object structures that are something
> > other than a straight phylogenetic tree -- e.g. networkx or graphviz for
> > plotting, numpy/scipy for crunching.
> > 2. Factor a simple tree structure out of lagrange and Bio.Nexus, and let
> > that be the Biopython default representation. Add a function in
> > Bio.PhyloXML to export its enhanced tree structure to this simpler
> > Bio.Tree representation.
> > I wrote Bio.PhyloXML.Tree to use the naming conventions of phyloXML, but
> > otherwise be independent of that specific file format. It doesn't depend
> > on any XML library directly, and both child nodes and XML node
> > attributes appear as plain ol' object attributes in the tree. But the
> > Nexus module looked like the parser was kind of tied to the tree
> > representation, so I haven't reused any of that code yet. So #1 is my
> > preference, but it put the burden of inter-module compatibility on
> > whoever is maintaining Bio.Nexus, whereas #2 leaves my code on a quiet
> > little island for the rest of the summer.
> > All the best,
> > Eric
> Nicholas J. Matzke
> Ph.D. Candidate, Graduate Student Researcher
> Huelsenbeck Lab
> Center for Theoretical Evolutionary Genomics
> 4151 VLSB (Valley Life Sciences Building)
> Department of Integrative Biology
> University of California, Berkeley
> Lab websites:
> Dept. personal page:
> Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html
> Lab phone: 510-643-6299
> Dept. fax: 510-643-6264
> Cell phone: 510-301-0179
> Email: matzke at berkeley.edu
> Mailing address:
> Department of Integrative Biology
> 3060 VLSB #3140
> Berkeley, CA 94720-3140
> "[W]hen people thought the earth was flat, they were wrong. When people
> thought the earth was spherical, they were wrong. But if you think that
> thinking the earth is spherical is just as wrong as thinking the earth
> is flat, then your view is wronger than both of them put together."
> Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer,
> 14(1), 35-44. Fall 1989.
More information about the Biopython-dev