[BioRuby] [Wg-phyloinformatics] Update on phyloXML support for BioRuby project

Mon May 25 00:29:57 UTC 2009

Hi all,

Since there are much more elements in PhyloXML than in Bio::Tree I propose
to make a class PhyloXMLNode which inherits from Bio::Tree::Node.

PhyloXMLNode:
# attributes from Bio::Tree::Node
* bootstrap
* bootstrap_string
* ec_number
* name
* scientific_name
* taxonomy_id

#new attributes
* id_source
* confidence [] ([] means array of elements)
* color
* node_id
* taxonomy []
* sequence [] (Bio::Sequence object)
* events
* binary_characters
* distribution []
* date
* reference []
* property []

Also, since <phylogeny> element does not only consist of <clade> elements,
but other elements also, Bio::Tree class should be extended.

PhyloXMLTree
#inherited from Bio::Tree
* options
* root

# new attributes
* rooted (boolean)
* rerootable (boolean)
* branch_length_unit
* type
* name
* id
* description
* date
* confidence []
* clade_relation []
* sequence_relation []
* property []

I think inheritance is better than creating a separate class, because then
users will be able to use Bio::Tree as before, but also being able to read
PhyloXML data files. Also then conversion from PhyloXML to other formats
will be easy since Bio::Tree class has output_newick, output_nhx,
output_phylip_distance_matrix methods.

Diana

Project Page:
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:PhyloXML_support_in_BioRuby

On Thu, May 21, 2009 at 9:03 AM, Chris Fields <cjfields at illinois.edu> wrote:

> Actually, as Perl's XML::LibXML::Reader is described it almost sounds
> perfect, though I'm unsure of backtracking to a specific node in the
> tree (and thus post/pre-order of nodes). Saying that, I would be
> surprised if it weren't possible, though.
>
> chris
>
> On May 20, 2009, at 11:02 PM, Christian M Zmasek wrote:
>
> > Hi:
> >
> > Thanks for the detailed replies by Hilmar and Chris!
> > I think it is a very good idea to keep such very large trees in
> > mind, and possibly implement a solution which only loads requested
> > nodes into memory (as described by Hilmar and Chris) if there is
> > enough time left at the end of the project.
> >
> > Re "It's tricky with re: to a number of aspects, but it can be
> > done.  For  instance, if one wanted to modify the created nodes
> > (i.e. if the nodes  are mutable), or creating a generic Lazy set of
> > classes capable of   dealing with multiple formats."
> >
> > How would you do post-order or pre-order iteration of nodes?
> > Wouldn't you have to back and forth in the file?
> >
> > CZ
> >
> > Chris Fields wrote:
> >> On May 20, 2009, at 8:22 AM, Hilmar Lapp wrote:
> >>
> >>
> >>> On May 19, 2009, at 5:54 PM, Christian M Zmasek wrote:
> >>>
> >>>
> >>>> I think it is perfectly acceptable to expect to have enough memory
> >>>> to keep at least
> >>>> one tree in memory
> >>>>
> >>> Sounds like a good and perfectly reasonable starting point to me
> >>> too.
> >>> It's also the way other toolkits (such as BioPerl) work.
> >>>
> >>> Having said that, I don't find it inconceivable that we may be
> >>> working
> >>> with trees in the near future that don't fit into memory for a 1GB
> >>> RAM
> >>> machine if they are richly decorated (which is something that
> >>> phyloXML
> >>> wants to enable, isn't it?). Solving that to me though seems to be
> >>> question of writing an appropriate Tree implementation that
> >>> happens to
> >>> store most of the data on disk rather than in memory, and not an
> >>> issue
> >>> for how to write a parser. Ideally though, the parser uses a factory
> >>> for creating the (tree and/or node) objects, so that later it can be
> >>> made to use an on-disk Tree implementation simply by passing it
> >>> another factory. I.e., ideally the parser would not assume and hard-
> >>> code the Tree implementation class.
> >>>
> >>> Just my $0.02.
> >>>
> >>>     -hilmar
> >>>
> >>
> >> This could be implemented in a lazy way or using lightweight
> >> objects.   The Tree object itself contains the XML parser or a
> >> reference thereof  (probably LibXML Reader-based) and creates the
> >> relevant nodes as  needed.  The only thing needed would be some
> >> light parsing to indicate  start-end file points.
> >>
> >> It's tricky with re: to a number of aspects, but it can be done.
> >> For  instance, if one wanted to modify the created nodes (i.e. if
> >> the nodes  are mutable), or creating a generic Lazy set of classes
> >> capable of   dealing with multiple formats.
> >>
> >> Just in case anyone's wondering, I have been thinking along these
> >> lines for a while re: BioPerl, Bio::Seq, and very large files... ;>
> >>
> >> chris
> >>
> >
>
> _______________________________________________
> Wg-phyloinformatics mailing list
> Wg-phyloinformatics at nescent.org
> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics
>