[BioRuby] Drawing a phylogeny ASCII tree
Pjotr Prins
pjotr.public14 at thebird.nl
Sun Mar 18 10:09:05 UTC 2012
On Fri, Mar 16, 2012 at 02:51:53PM +0100, Pjotr Prins wrote:
> Anyone interesting in a little coding challenge? I wrote a feature
> for drawing a phylogeny ASCII tree:
>
> (snip)
>
> Then draw MSA with the short tree
> """
> +----------------- seq7 ----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
> | ,----- seq1 ----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
> | ,--| ,-- seq2 SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
> | ,--+ `--+-- seq3 SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
> | | |--+----- seq5 ----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
> | ,--| `----- seq8 --------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
> |--| `----------- seq4 ----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
> `-------------- seq6 ----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
> 1 2 3 4 5
> """
I have been thinking about this 'ASCII cladogram' algorithm, as it
would be very useful for testing code. Unfortunately I have found no
example that really appeals to me in the other Bio* projects (we are
talking text here, not graphics; the graphics ones are actually
nicer).
The examples I have are
* BioPerl tabtree
* Ruby challenge http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/149701
* BioPythons http://biopython.org/DIST/docs/api/Bio.Phylo._utils-pysrc.html#draw_ascii
I have mailed T. Mike Keesey, of
http://www.palaeos.org/Ascii_phylogenetic_tree, as he has some interesting
generator, but he has not responded, yet.
So, assuming I am on my own, I was thinking that the tree needs to be
drawn in a matrix of characters. The pseudo code I come up with is
based on 'matrix expansion' - every time a sequence gets added, we
add a line. If a tree goes left (up), e.g. the last node before seq2,
we expand the matrix by injecting a new line and extending the
verticals. We extend the horizontals in the final step.
Assuming we know the tree with leafs and nodes:
step 1, start with the first leaf seq7 (R=root, L=Leaf, o=node, N=new node)
R--L seq7
step 2, add seq1 (4 nodes)
R--L seq7
`--o--o--o--o--L seq1
step 3, add seq2 (5 nodes)
R--L seq7
| ,--L seq1
`--o--o--o--N--o--L seq2
Now what you see is that seq1 is on the branch of new seq2. So after
adding seq2, we wipe seq1 to the left of N and copy the verticals on
the left side of the new node (N).
step 4, add seq 3
R--L seq7
| ,--L seq1
| | ,--L seq2
`--o--o--o--|--N--L seq3
Same principle. After adding seq3, we find seq2 is connected and we
split seq2 - left side extend verticals, right side connect
step 5, add seq5 (4 nodes)
R--L seq7
| ,--L seq1
| | ,--L seq2
| ,--|--N--L seq3
`--o--o--N--o--L seq5
Again, seq5 connects at the 3rd node. So we split the above
step 6, seq 8 (4 nodes)
R--L seq7
| ,--L seq1
| | ,--L seq2
| ,--|--N--L seq3
| | ,--L seq5
`--o--o--|--N--L seq8
Split on node 4. Expand verticals to the left of N.
Here the algorithm changed the original drawing a little.
step 7, seq 4 (2 nodes)
R--L seq7
| ,--L seq1
| | ,--L seq2
| ,--|--N--L seq3
| | ,--L seq5
| ,--|--N--L seq8
`--o--N--L seq4
step 8, seq6 (1 node)
R--L seq7
| ,--L seq1
| | ,--L seq2
| ,--|--N--L seq3
| | ,--L seq5
| ,--|--N--L seq8
| ,--N--L seq4
`--N--L seq6
step 9, we can expand horizontally
R----------------- seq7
| ,----- seq1
| | ,-- seq2
| ,--+--+-- seq3
| | ,----- seq5
| ,--+--+----- seq8
| ,--+----------- seq4
`--+-------------- seq6
1 2 3 4 5
I think this is pretty much what Mike Keesey does on
http://namesonnodes.org/texttree/
Anyone see a flaw in my reasoning?
Pj.
More information about the BioRuby
mailing list