[Biopython-dev] Newick support in Bio.TreeIO?

Peter biopython at maubp.freeserve.co.uk
Thu Jul 30 09:13:29 UTC 2009


On Thu, Jul 30, 2009 at 5:10 AM, Eric Talevich<eric.talevich at gmail.com> wrote:
> On Wed, Jul 29, 2009 at 3:37 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:
>
>> The parsing code looks weird to me - but that is probably a style
>> thing. Certainly I had to stare at it to work out what it was doing.
>> It also has a bug - consider a Newick file containing one tree but
>> with no trailing semi colon.
>
> Hilmar says there's supposed to be a terminal semicolon; I didn't check
> what Biopython's parser does but I suppose this should duplicate that.

Hilmar is right, see
http://evolution.genetics.washington.edu/phylip/newicktree.html
However, in this case I would opt to support this variant anyway for
input (but you must include the ";" on output).

> Plan:
> TreeIO has read(), parse(), write(), and possibly convert(), which behave
> exactly like the corresponding AlignIO and SeqIO functions, but with trees.
> Under Bio.TreeIO we have wrappers for other formats, and these wrappers may
> have public functions that go beyond the shared TreeIO ones.

Sounds good.

> In some cases this can lead to a specific read-like function that returns a
> single object containing one or more trees, plus other tree-related
> metadata. This function can either be called read() also, as it currently is
> in PhyloXMLIO, or we could choose another name like load().
>
> For basic tree access:
>
> from Bio import TreeIO
> tree = TreeIO.read('example.xml', 'phyloxml')
> TreeIO.write([tree], 'example.nex', 'nexus')
>
> For the connoisseur:
>
> from Bio.TreeIO import PhyloXMLIO
> phx = PhyloXMLIO.read('example.xml')
> if phx.other: # do something clever...

Sounds OK to me at first glance.

>  Of course, in practice Nexus files may not be that big. I don't
>> know if anyone uses them to store (for example) 1000 bootstrap trees.
>> As Brad and I have noted before, spending time on refactoring Bio.Nexus
>> is not the best use of your GSoC project time (plus we'd need to get
>> Cymon and Frank much more involved, worry more about backwards
>> compatibility etc).
>
> This refactoring quest actually started because I was trying to figure out
> an object model for BaseTree that could support PhyloDB, reuse the Nexus
> tree methods with some resemblance to the original form, and still provide
> useful base classes for phyloXML. That was holding up everything else --
> but I think it's under control now.

Cool.

>> I agree that tree drawing would be a nice addition to Bio.Graphics.
>> ...
>
> Maybe it will be worth another shot after the Tree module settles down. If
> networkx export comes easily this week, that may take also take care of
> visualization for some uses.

Good point. In fact from memory, my tree PDF code was probably using
Thomas Mailund's Newick parser (not Bio.Nexus which didn't exist when
I first started work on trees).
http://www.birc.au.dk/~mailund/newick.html

Peter




More information about the Biopython-dev mailing list