[BioRuby] BioRuby: newick parser

Pjotr Prins pjotr.public14 at thebird.nl
Mon Mar 12 10:08:41 UTC 2012


Hi Naohisa,

Thanks for the quick reply. I think the Newick parser and Phyloxml
parser are the way to go. I guess it is partly due to me that the
latter is somewhat complex for not reading the DOM in RAM ;).

I'll start using the string based Newick parser for MSA, as it is so
common, and see if I can make it flow naturally :)

Pj.

On Mon, Mar 12, 2012 at 03:18:48PM +0900, Naohisa GOTO wrote:
> Hi Pjotr,
> 
> They can be divided into several parts.
> 
> 1. Newick/NHX parser and writer: 
> 
> 1-1. Implementation: I think it is enough quiality. The implementation
> complexity is due to the Newick specification (e.g. escaping of special
> characters) and some undocumented conventions (e.g. bootstrap values).
> For refactoring, using Racc (parser generator for Ruby) seems good,
> but low priority.
> 
> 1-2. Parser API: Parsing a string is simple. Reading from files is
> depended on Bio::FlatFile system, which is enough for most cases.
> 
> 1-3. Writer API: depending on Bio::Tree API.
> 
> 2. Nexus parser and writer:
> 
> 2-1: Implementation: I don't know details of current status,
> but for trees, it only passes the data to Bio::Newick class.
> Please ask Christian for details.
> 
> 2-2: API: Nexus Parser API is complicated because the Nexus
> specification is very complex.
> It seems that Nexus writer is missing.
> 
> 3: PhyloXML parser and writer:
> 
> 3-1: Parser implementation and API: Enough quality. Its
> complexity is mainly due to the on-demand partial reading of
> XML files, which saves memory for a large tree file.
> 
> 3-2: Writer implementation and API: Not enough. It can only
> write PhyloXML data, and it is very hard to output Bio::Tree
> as PhyloXML format.
> 
> 3-3: Other topics: It uses libxml-ruby, but it seems that
> Ruby XML parser de-facto standard is now Nokogiri, and
> I think it may be rewritten by using Nokogiri in some days.
> 
> 4. Bio::Tree data structure:
> 
> 4-1. Implementation: It is based on BioRuby internal graph
> library. It can be changed to use other graph library.
> 
> 4-2. API: The API design is based on tree API of other
> open-bio projects and generic graph library API.
> 
> When writing HOWTO based on BioPerl HOWTO:Trees,
> (http://bioruby.open-bio.org/wiki/HOWTO:Trees but still incomplete)
> I'm thinking to add/modify some API about speficying nodes/edges.
> 
> 
> > I see Christian has done a lot of work in this area (mostly in Java),
> > even to the point of taking standards forward. Maybe I should ask him?
> 
> I'd like to hear his advice, too.
> 
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> 
> 
> On Sun, 11 Mar 2012 13:53:08 +0100
> Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
> 
> > Hi Naohisa and others,
> > 
> > I am looking at the Newick/Nexux/PhyloXML parsers at the moment. The
> > BioRuby ones look rather complete, if not a tad overcomplicated. 
> > 
> > Are you happy with the state of affairs, or do you thing it could be
> > improved/simplified?  Also, for walking the tree, is the interface now
> > the one you would choose to implement?
> > 
> > I am asking, because I am looking for the most intuitive way of
> > parsing and traversing tree information. I see Christian has done a
> > lot of work in this area (mostly in Java), even to the point of
> > taking standards forward. Maybe I should ask him? It appears to me we
> > have solid parsers and data structures. Walking the trees, however,
> > is less straightforward, and documentation somewhat lacking.
> > 
> > Anyone happy to correct me?
> > 
> > Pj.
> 



More information about the BioRuby mailing list