[Bioperl-l] Re: Comparative genomics

Daniel Barker db2@sanger.ac.uk
Fri, 28 Sep 2001 14:43:16 +0100 (BST)


> There are several matrix formats. The one I used/prefer was #NEXUS
> format - used by several of the best pylogenetic (ie cladistic)
> reconstruction programs like Paup. It has the facility to embed a
> complete analysis configuration, data and the output in nested tree
> descriptions.

I prefer the more programmer-friendly PHYLIP formats, which are
well-established and a sort of a "lowest common denominator" in that most
phylogeny programs can at least import and export them. Also, I slightly
disapprove of the way way Nexus lets you mix alignments, analysis and
trees in the same file.

This is just personal preference though: one could argue it either way.
(And PHYLIP too lets you put some options in its tree and data files,
though I think this is going to be disallowed in a later version.)

P.S. In my phylogeny program LVB, written in C, I chose to represent a
tree as an array structures:

#define UNSET (-1)                      /* value of integral vars when unset */

typedef int Branchno;           /* branch no. (array offset, count or UNSET) */
typedef int Objno;              /* object no. (array offset, count or UNSET) */

/* branch of tree */
typedef struct
{
        Branchno parent;        /* parent branch number, UNSET in root */
        Branchno left;          /* child 1 number */
        Branchno right;         /* child 2 number */
        Objno object;           /* object number if leaf, otherwise UNSET
*/
} Branch;

And the index of the root node ("root branch") was stored separately in an
integer.

I'm not sure if this is any use, and actually I would omit the "object"
field now: one can ensure objects (i.e., sequences) 0..n-1 are permanently
associated with branches 0..n-1 in the array. I think some other programs
do that.

PHYLIP: http://evolution.genetics.washington.edu/phylip.html

LVB: http://www.icmb.ed.ac.uk/lvb/sokal.html

"LVB version 1.0A 18 August 1997, with Extension 1 written by Daniel
Barker, May 1998" can parse trees from file, but if I were writing this
now, I would definitely re-use PHYLIP's source code. This is probably
worthwhile wherever one is working in C. I don't know if it would make
sense for SQL. (Also, check PHYLIP's re-use conditions. They're fairly
unrestrictive, but not GNU-like.) Could you just "wrap" the relevant bits
of PHYLIP?

-- 
Daniel Barker.