[Biopython-dev] [Wg-phyloinformatics] BioGeography update

Fri Jul 10 12:07:34 UTC 2009

Hi Eric;

> > The proposal is to extract the Tree class hierarchy so that other modules
> > can share it, and Biopython users can do I/O with trees as easily as they
> > currently can with sequences ("from Bio import TreeIO; for tree in
> > TreeIO.parse('example.xml', 'phyloxml'): ...").
> > ...

Sounds great. For most of this I will defer to Peter's expert
opinion. As he mentioned, basing this off of SeqIO/AlignIO makes a
lot of sense.

> > In the above case, TreeIO.py is a new file containing wrappers for the read
> > and parse functions in my PhyloXML module, and also Nexus and Newick,
> > pending integration. ...
> >
> > Alternatively, the individual modules that implement each format for I/O can
> > be collected under a new TreeIO directory, with __init__ implementing the
> > wrappers: ...
> 
> Either idea sounds reasonable. However, for future extensivility, and
> also consistency with Bio.SeqIO and Bio.AlignIO, I would suggest we
> have Bio/TreeIO/__init__.py (i.e. as a folder containing as many
> wrappers or parsers as needed) rather than just using Bio/TreeIO.py
> (a single file).

Agreed. The imports are the same but this gives added flexibility.

> Note that the Nexus parser is much more than just a tree parser.
> NEXUS files can contain trees, but much more besides (including a
> multiple sequence alignment, and instructions to phylogenetic
> tools). In the short term for TreeIO and Nexus, I would just have
> Bio/TreeIO/NexusIO.py as a thin wrapper that calls Bio.Nexus and
> converts its trees into the standard trees (i.e. we don't have to
> make any changes to Bio.Nexus immediately). In the longer term,
> it would make sense for Bio.Nexus to start using the new tree
> objects - but we also have backwards compatibility to think about.

Also agreed. We should get Bio.Nexus updated enough so that is can
handle Nick's problem files, and from there apply a wrapper to push
Nexus trees into a generic tree compatible with PhyloXML. This will
force us to be general about the Tree implementation, but save some
re-writing and maintain back-compatibility. Once the generic tree
is hammered out and everyone is happy, then we can think about
migrating Nexus to it. Seconding Peter's comments, this is probably
another big job.

So, in summary, the major deliverables are:

- Generic tree representation plus a TreeIO structure
- PhyloXML parser that uses this tree directly
- Nexus parser that can handle problem files and parse into the
  generic tree. This will let us drop the lagrange duplication from
  Nick's code.

Sounds like you have this well worked out,
Brad