[Biopython-dev] Support for NeXML and RDF trees in Bio.Phylo

Ben Morris ben at bendmorris.com
Fri Dec 28 15:50:02 UTC 2012


On Tue, Dec 25, 2012 at 2:18 AM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> On Mon, Dec 24, 2012 at 8:58 AM, Ben Morris <ben at bendmorris.com> wrote:
>>
>> Hi all,
>>
>> I've implemented support for two new phylogenetic tree formats: NeXML and
>> RDF (conforming to the Comparative Data Analysis Ontology).
>>
>> I noticed that NeXML support was planned, but I didn't see anyone working
>> on it on GitHub and the feature request hadn't been updated in about a
>> year, so I went ahead and implemented a simple version. At first I tried
>> the generateDS.py approach, but the generated writer doesn't give very much
>> control over the output, so I ended up writing my own parser/writer using
>> ElementTree.
>>
>> As for the RDF/CDAO format, AFAIK this is not a format that's supported by
>> any other phylogenetic libraries, so I'm not sure how useful this is to
>> everyone else. It provides a simple, standards-compliant format that can be
>> imported to a triple store and supports annotation. We'll be using it at
>> NESCent so I wanted to make it available to everyone else as well. The
>> parser and writer require the Redlands Python bindings.
>>
>> The code is available in my fork of Biopython,
>>
>>     https://github.com/bendmorris/biopython
>>
>> under branches "cdao" and "nexml." I'd love to get everyone's thoughts and
>> see if these contributions would be a good fit for the Biopython project.
>
>
>
> Thanks for letting us know! I'll try it out soonish. Looking at the code on your nexml branch, I have a few comments:
>
> - The parser uses ElementTree.parse rather than iterparse, so in its current state it would not be able to parse massive files (those larger than available RAM). Worth fixing eventually?

Great point. I rewrote it to use iterparse instead.

> - The parser creates Newick.Tree and Newick.Clade objects, which is nearly correct in my opinion. I would suggest subclassing BaseTree.Tree and BaseTree.Clade to create NeXML-specific Tree and Clade classes, even if you don't have any additional attributes to attach to those classes at the moment. (These would go in a new file NeXML.py, similar to PhyloXML.py and PhyloXMLIO.py.)

Went ahead and did this as well.

> - The 'confidence' or 'confidences' attribute isn't used (for e.g. bootstrap support values). Does NeXML define it?

Not that I'm aware of, but I'm not sure. I searched
http://nexml.org/nexml/html/doc/schema-1/ and didn't find anything.
I'm going to ask some people who know more about this than I do.

~Ben




More information about the Biopython-dev mailing list