[Biojava-l] Rooted trees in nexus files
Richard Holland
holland at eaglegenomics.com
Tue Nov 3 20:20:31 UTC 2009
A getRoot() function sounds good. It would return the String label of
the root node, the same as which identifies the corresponding vertex
in the JGraphT model. An equivalent setRoot() would be nice.
The prefix for the parser currently is hardcoded as p. Two new methods
- set and getDefaultPrefix which accept a string should be provided
(it should check that the string is valid, i.e. all alphanumeric and
with no spaces or other Newick-sensitive characters). The parser
should be changed to use the output from getDefaultPrefix() instead of
the hardcoded p. The default behaviour should be such that it behaves
the same as at present unless the user explicitly says otherwise by
calling the setDefaultPrefix() method.
Personally I would also alter the methods that return JGraphTs so that
they return their Directed equivalents if possible. I believe that
these can still be unrooted - you'd have to check the JGraphT
documentation to make sure.
Richard.
On 3 Nov 2009, at 18:55, Tiago Antão wrote:
> But the point is that the class interface changes to the outside user:
> 1. How does one report back the root to the user?
> 2. Regarding the prefix stuff, should the user be allowed to specify a
> preferred prefix?
>
> Both this things imply interface changes visible to users.
> If you still need volunteers to do the change, I can do it. But I need
> to know what changes to the user interface are to be done.
> For 1, maybe a method getRoot, returning a string with the name of the
> root node?
> For 2, maybe an extended version of the parse function with a suffix
> as input parameter?
>
> 2009/11/3 Richard Holland <holland at eaglegenomics.com>:
>>> 1. Lack of knowledge of root node
>>
>> The Newick tree string is read as-is and is not parsed. It only
>> gets parsed
>> at the point of conversion to a Undirected or WeightedGraph inside
>> the
>> TreeBlocks.java source code (inside the two types of get-As-JGraphT
>> methods). It's at this point the string is parsed and it's here
>> that root
>> note determination should take place. It's already known whether &R
>> or &U
>> have been specified here, which should help the code work out what
>> to do.
>>
>>> 2. The p* stuff.
>>
>> Exactly the same part of the code as described above. Wherever it
>> pushes
>> values to the stack but prepends them with 'p' first, you'll need
>> to change
>> the 'p' to some instance variable and provide a getter/setter to
>> change it,
>> with 'p' being the default setting.
>>
>> cheers,
>> Richard
>>
>>>
>>> Tiago
>>> 2009/11/3 Richard Holland <holland at eaglegenomics.com>:
>>>>
>>>> Agreed that there is a bug. Now all we need is someone to go in
>>>> and fix
>>>> it!
>>>> :)
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> On 3 Nov 2009, at 18:16, Tiago Antão wrote:
>>>>
>>>>> 2009/11/3 Thasso Griebel <thasso.griebel at uni-jena.de>:
>>>>>>
>>>>>> There is a way to uniquely get a root from a newick string.
>>>>>> Usually a
>>>>>> rooted newick is surrounded with brackets, which indicates the
>>>>>> root as
>>>>>> the
>>>>>> highest node in the tree. For example:
>>>>>>
>>>>>> (A, (B,C))
>>>>>>
>>>>>
>>>>> Agree, it is quite easy to get the root of the tree from the
>>>>> newick
>>>>> representation. But it should be done on parsing and returned in
>>>>> some
>>>>> way by the parsing system. If the user has to do it again, it
>>>>> means
>>>>> that the user has to parse it again just to know the root node.
>>>>>
>>>>>> I would also suggest to generally parse trees as rooted trees
>>>>>> (maybe
>>>>>> jsut
>>>>>> for th initial internal model). Creating an unrooted tree from
>>>>>> a rooted
>>>>>> one
>>>>>> is easy, remove the root and forget about directions. The other
>>>>>> way
>>>>>> might
>>>>>> be
>>>>>> hard and ambiguous.
>>>>>
>>>>> 100% agree.
>>>>> The newick _representation_ always has a root by virtue of the
>>>>> way it
>>>>> is done. If that root has meaning or not depends. Doing as you
>>>>> suggest
>>>>> seems the most reasonable idea.
>>>>> I would add that even if it is an unrooted tree, the topology
>>>>> might be
>>>>> of interest. In my case I am doing a comparative visualizer and it
>>>>> might be nice for the user to be able to visualize the topology as
>>>>> specified. It has no biological meaning, but in practice, for many
>>>>> users, it helps.
>>>>> I note that PhyloXML (even by virtue of being a XML format) always
>>>>> represents the phylogenies as trees (not weigthed DAGs). There an
>>>>> attribute rooted which can be true or false.
>>>>>
>>>>> But, anyway. Even assuming a very conservative view on this, the
>>>>> current parser, for rooted trees, does not allow to determine
>>>>> where is
>>>>> the root. I think that there would be a consensus that that is a
>>>>> bug?
>>>>>
>>>>> Tiago
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>>> http://www.eaglegenomics.com/
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> "The hottest places in hell are reserved for those who, in times of
>>> moral crisis, maintain a neutrality." - Dante
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
>>
>
>
>
> --
> "The hottest places in hell are reserved for those who, in times of
> moral crisis, maintain a neutrality." - Dante
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
More information about the Biojava-l
mailing list