[Biopython-dev] String representation of trees in Bio.Phylo

Eric Talevich eric.talevich at gmail.com
Sun Apr 4 14:50:21 UTC 2010


Hi all,

The new phylogenetics module Bio.Phylo supports a few new ways of displaying
trees. I'm trying to decide which of these should be used as the informal
string representation for whole trees, i.e. what happens when you type
"print tree" for some newly parsed tree object.

A Tree consists of some global information (e.g. rooted or not) plus nested
lists of Subtrees, which Clade objects in PhyloXML inherit from. Currently,
the Subtree __str__ method is treated as a label for a clade -- it's the
clade's name, if available; in the absence of any other identifier it prints
out the class name. Similarly, str(Tree) just prints out the tree's 'name'
attribute, or "Tree"; this probably isn't what the user expects, though.

Here are the options. To start the example, here's a tree parsed from
phyloXML and displayed as a Newick tree:

>>> from Bio import Phylo
>>> tree = Phylo.parse('ex/phyloxml_examples.xml', 'phyloxml').next()
>>> print tree.format('newick')
((A:0.10200,B:0.23000)0.00000:0.06000,C:0.40000)0.00:0.00000;


The pretty_print function, with the show_all option, uses 'repr' recursively
to display the tree's nodes. I think this is probably the best choice for
Tree.__str__, but it can be a bit cluttered if a lot of information is
attached to each node/subtree/clade.

>>> Phylo.pretty_print(tree, show_all=True)
Phylogeny(rooted='True', description='phyloXML allows to use either a
"branch_length" attribute...', name='example from Prof. Joe Felsenstein's
book "Inferring Phyl...')
    Clade()
        Clade(branch_length='0.06')
            Clade(branch_length='0.102', name='A')
            Clade(branch_length='0.23', name='B')
        Clade(branch_length='0.4', name='C')


By default, pretty_print uses 'str' instead of 'repr', showing only class
names and string representations (labels) to reduce the clutter:

>>> Phylo.pretty_print(tree)
Phylogeny: example from Prof. Joe Felsenstein's ...
    Clade: Clade
        Clade: Clade
            Clade: A
            Clade: B
        Clade: C

Is this useful to anyone? If not, then I could drop this part of the
pretty_print function entirely.

As an alternative, we could print the tree as ASCII art, as some other
toolkits do. However, this function is very limited -- it doesn't print
internal node labels, and trees of more than a couple hundred nodes will
look strange, since the drawing is compressed into a fixed number of
character columns (default 80).

>>> Phylo.draw_ascii(tree)
             __________________ A
  __________|
_|          |___________________________________________ B
 |
 |___________________________________________________________________________
C



For reference, here's the raw phyloXML:

>>> Phylo.write(tree, sys.stdout, 'phyloxml', indent=True)
<phy:phyloxml xmlns:phy="http://www.phyloxml.org">
  <phy:phylogeny rooted="true">
    <phy:name>example from Prof. Joe Felsenstein's book "Inferring
Phylogenies"</phy:name>
    <phy:description>phyloXML allows to use either a "branch_length"
attribute or element to indicate branch lengths.</phy:description>
    <phy:clade>
      <phy:clade>
        <phy:branch_length>0.06</phy:branch_length>
        <phy:clade>
          <phy:name>A</phy:name>
          <phy:branch_length>0.102</phy:branch_length>
        </phy:clade>
        <phy:clade>
          <phy:name>B</phy:name>
          <phy:branch_length>0.23</phy:branch_length>
        </phy:clade>
      </phy:clade>
      <phy:clade>
        <phy:name>C</phy:name>
        <phy:branch_length>0.4</phy:branch_length>
      </phy:clade>
    </phy:clade>
  </phy:phylogeny>
</phy:phyloxml>


What do you think?

Thanks,
Eric



More information about the Biopython-dev mailing list