[Biopython-dev] GSoC Weekly Update 9: PhyloXML for Biopython
eric.talevich at gmail.com
Mon Jul 20 14:57:15 UTC 2009
Previously (July 13-17) I:
- Implemented "Collapse Whitespace Policy" -- the spec mentions this in
glossary but doesn't appear to say where it should be use, so I
it willy-nilly. (Mainly on 'name' and 'desc'/'description' node text.)
- Made Writer use the normal namespace prefixes -- for
though it technically doesn't matter for parsing.
- Tried XSD validation on the PhyloXML.Writer output using xmlstarlet --
failed, probably due to element ordering.
- Created Bio.Tree and Bio.TreeIO modules. The PhyloXML tree classes are
all under Bio.Tree now, while TreeIO contains just a thin wrapper for
Parser and Writer (still under Bio.PhyloXML). Three mostly empty base
classes live in Bio.Tree.BaseTree and PhyloXML's tree classes now
from them. This made it possible to generalize the Utils.pretty_print
function and move it to Bio.Tree.Utils. The other "utility", for
xml tag names, was added to PhyloXML's Parser near the other
- Checked that 'other' objects won't belong to the phyloXML namespace.
This week (July 20-24) I will:
Extend the core to the rest of the spec:
- Adding unit tests and classes to support the remaining (non-core)
- Use the schema document to validate the input file -- or at least,
Writer use the correct sub-node ordering
- Take a stab at phyloXML 1.10 support
Work on documentation:
- Address remaining comments from code/doc review
- Revisit docstrings for all classes, functions, methods; consider
- Improve the SeqRecord conversion
- Warnings: show the offending line at the previous level in the stack
I haven't done anything specifically for Nexus integration, though I'm
at the Bio.Nexus Tree and Node classes while writing Bio.Tree.BaseTree
I'm also looking at PhyloDB, the BioSQL extension. Plan: BaseTree classes
mirror PhyloDB tables, and any methods from PhyloXML trees that only rely on
those attributes will be moved to the base classes.
Attribute naming will be tricky -- the 'node' in Nexus and PhyloDB is called
'clade' in phyloXML, and most of the base-class methods will operate on that
1. Create two properties on PhyloXML's Clade and Phylogeny classes,
'clade' and 'clades', that simply access the object's 'node' attribute.
2. Break phyloXML's naming convention, and call a 'clade' a 'node'. The
functions currently treat tag_name<->attribute as the general case, with
exceptions like pluralization scattered in, so making this change will
unpretty but not horrible.
More information about the Biopython-dev