[Biopython] Bio.Phylo: writing newick trees with internal node names

Eric Talevich eric.talevich at gmail.com
Thu Mar 22 19:29:58 EDT 2012


On Wed, Mar 21, 2012 at 6:12 AM, Tanya Golubchik
<golubchi at stats.ox.ac.uk> wrote:
> There's a few other strange things in Phylo that I can't work out,
> though -- for instance, what happens to 'PhyloXML.Other' attributes -- I
> can write these on the tree, and save the tree, but it can't be
> re-opened because the parser rejects it as improperly formatted. The
> documentation is a bit vague on this; in particular, passing None to
> 'attributes' when creating Phylo.Other objects fails, while passing an
> empty dictionary works... what is meant to be in 'attributes' when
> creating an Other object?

The Other element is somewhat vaguely defined in PhyloXML
specification, too; it's meant to allow defining new XML elements
without updating the official spec. The 'attributes' attribute
translates directly to the attributes of the new XML element you're
creating.
It should be a dictionary of strings-to-strings (somewhat like the
'annotations' attribute of SeqRecord). Something like:

>>> other = PhyloXML.Other("img", attributes={"src"="foo.png"})
>>> mytree.other.append(other)
>>> print mytree.format("phyloxml")

I see there was a bug here, where the PhyloXML.Other constructor
should initialize 'attributes' to an empty dictionary if it's not
provided. Fixed in the trunk:
https://github.com/biopython/biopython/commit/9e3fec461b189fe77b10db6de0c88df5b77e5bb0


> Also, the 'is_aligned' sequence property disappears when a tree is saved
> in phyloxml format and then read back using Phylo.read:
>
>>>> print tree
> Phylogeny(rooted=True, branch_length_unit='SNV')
>    Clade(branch_length=0.0, name='N1')
>        Clade(branch_length=0.0, name='C00000761')
>            BranchColor(blue=0, green=128, red=0)
>            Sequence(type='dna')
>                MolSeq(value='CCTTTCTATGTTCTGGACTGACGTTAAACGA',
> is_aligned=True)
>        Clade(branch_length=0.0, name='C00000763')
>            BranchColor(blue=0, green=0, red=255)
>            Sequence(type='dna')
>                MolSeq(value='CCTTTcTATGTtCTGGACTGACGTTAAACGA',
> is_aligned=True)
>
>>>> Phylo.write(tree, myfile, 'phyloxml')
> 1
>>>> tree2 = Phylo.read(myfile, 'phyloxml')
>>>> print tree2
> Phylogeny(rooted=True, branch_length_unit='SNV')
>    Clade(branch_length=0.0, name='N1')
>        Clade(branch_length=0.0, name='C00000761')
>            BranchColor(blue=0, green=128, red=0)
>            Sequence(type='dna')
>                MolSeq(value='CCTTTCTATGTTCTGGACTGACGTTAAACGA')
>        Clade(branch_length=0.0, name='C00000763')
>            BranchColor(blue=0, green=0, red=255)
>            Sequence(type='dna')
>                MolSeq(value='CCTTTcTATGTtCTGGACTGACGTTAAACGA')
>

This looks like a bug, too. (Thanks for finding these!) I don't
immediately see the cause of the problem, I'll try to take a crack at
it soon.



More information about the Biopython mailing list