[Biopython-dev] GSoC Weekly Update 8: PhyloXML for Biopython

Eric Talevich eric.talevich at gmail.com
Mon Jul 13 11:21:20 EDT 2009


Hi all,

Previously (July 6-10) I:
    - Addressed some comments from last week's code/doc review
    - Enabled Pythonic syntax sugar (dictionary emulation, specialized
        __str__ methods, singular properties for some plural attributes),
        plus tests
    - Wrote Clade.find() for flexible searching
    - Checked Py2.4 compatibility (it's slower, but it works)
    - Started Bio.Tree, Bio.TreeIO modules (integration)


This week (July 13-17) I will:

    Extend the core to the rest of the spec:

    - Adding unit tests and classes to support the remaining (non-core)
      phyloXML elements
    - Implement collapse_whitespace -- see the spec glossary
    - Make Writer use the correct namespace prefixes
    - "other" objects: assert the namespace is not phyloxml
    - Use the schema document to validate the input file

    Integrate with Biopython:

    - Extract a Bio.Tree.BaseTree module from PhyloXML's tree classes
    - Improve the SeqRecord conversion

    Improve/revise documentation:

    - Address remaining comments from code/doc review
    - Revisit docstrings for all classes, functions, methods; consider
enabling
      epydoc formatting


Questions:
    - My serializer uses XML entity codes instead of unicode characters in
the output --
      is that OK? It still round-trips successfully with the parser.

    - Is there anything to do for BioSQL compatibility, besides extracting
sequences?


Cheers,
Eric
http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:Biopython_support_for_parsing_and_writing_phyloXML


More information about the Biopython-dev mailing list