[Biopython-dev] GSoC Weekly Update 11: PhyloXML for Biopython

Eric Talevich eric.talevich at gmail.com
Mon Aug 3 14:57:59 UTC 2009


Hi all,

Previously (July 27-31) I:

    - Added the remaining checks for restricted tokens
    - Modified the tree, parser and writer for phyloXML 1.10 support -- it
      validates now, and unit tests pass. PhyloXML 1.00 validation breaks,
but
      that won't affect anyone except BioPerl, and they said they can deal
with
      it on their end
    - Changed how the Parser and Writer classes work to resemble other
      Biopython parser classes more closely
    - Picked standard attributes for BaseTree's Tree and Node objects
(informed
      by PhyloDB, though the names are slightly different); added
      properties to PhyloXML's Clade to mimic both types
    - Made SeqRecord conversion actually work (with reasonable
round-tripping
      capability); added a unit test
    - Changed __str__ methods to not include the object's class name if
there's
      another representative label to use (e.g. name) -- that's easy enough
to
      add in the caller
    - Sorted out the TreeIO read/parse/write API and added some support for
      the Newick format, as recommended by Peter on biopython-dev
    - Split some "plumbing" (depth_first_search) off from the Tree.find()
      method. Since there are a lot of potentially useful methods to have on
      phylogenetic tree objects, I think it's best to distinguish between
      "porcelain" (specific, easy-to-use methods for common operations) and
      "plumbing" (generalized or low-level methods/algorithms that porcelain
      can rely on) in the Tree class in Bio.Tree.BaseTree.
    - Started a function for networkx export. The edges are screwy right
      now, so I haven't checked it in yet.


This week (Aug. 3-7) I will:

    Scan the code base for lingering TODO/ENH/XXX comments

    Discuss merging back upstream

    Work on enhancements (time permitting):

    - Clean up the Parser class a bit more, to resemble Writer
    - Finish networkx export
    - Port common methods to Bio.Tree.BaseTree (from Bio.Nexus.Trees and
other
      packages)

    Run automated testing:

    - Re-run performance benchmarks
    - Run tests and benchmarks on alternate platforms
    - Check epydoc's generated API documentation and fix docstrings

    Update wiki documentation with new features:

    - Tree: base classes, find() etc.,
    - TreeIO: 'phyloxml', 'nexus', 'newick' wrappers; PhyloXMLIO extras;
warn
      that Nexus/Newick wrappers don't return Bio.Tree objects yet
    - PhyloXML: singular properties, improved str()


Remarks:

    - Most of the work done this week and last, shuffling base classes and
      adding various checks, actually made the I/O functions a little
slower.
      I don't think this will be a big deal, and the changes were necessary,
      but it's still a little disappointing.

    - The networkx export will look pretty cool. After exporting a Biopython
      tree to a networkx graph, it takes a couple more imports and commands
to
      draw the tree to the screen or a file. Would anyone find it handy to
have
      a short function in Bio.Tree or Bio.Graphics to go straight from a
tree
      to a PNG or PDF? (Dependencies: networkx, matplotlib or maybe
graphviz)

    - I have to admit this: I don't know anything about BioSQL. How would I
use
      and test the PhyloDB extension, and what's involved in writing a
      Biopython interface for it?


Cheers,
Eric
http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:Biopython_support_for_parsing_and_writing_phyloXML



More information about the Biopython-dev mailing list