[BioRuby] [GSoC][NeXML and RDF API] Code Review.
Pjotr Prins
pjotr.public14 at thebird.nl
Sun Jun 27 02:47:31 EDT 2010
Thanks Rutger and Hilmar,
Anurag, let's not load everything in memory.
Pj.
On Sat, Jun 26, 2010 at 05:30:19PM -0700, Hilmar Lapp wrote:
> Our ability to reconstruct trees of hundreds, thousands, and even tens
> of thousands of characters has improved dramatically over the past
> couple of years, and is increasingly often the goal of an analysis.
> Genome-scale alignments also aren't so rare anymore.
>
> Aside from analysis, NeXML files can be produced by a database, and
> hence could hold large taxonomies, or the tree of life.
>
> NeXML is an emerging standard. If implementations can't cope with the
> large scale data that are becoming increasingly popular, it'll have a
> hard time to get uptake.
>
> -hilmar
>
> On Jun 25, 2010, at 12:42 AM, Pjotr Prins wrote:
>
>> I think this needs to be answered by Rutger. Are we going to face
>> NeXML files in the future that can easily outrun memory?
>>
>> Pj.
>>
>> On Fri, Jun 25, 2010 at 01:04:21PM +0530, Anurag Priyam wrote:
>>>> How much time would it cost you to stream the data - and what does
>>>> it
>>>> mean with regard to changing the API? I guess, in general, NeXML
>>>> files won't be that large, so it may not be that important (Rutger)?
>>>>
>>>> Pj.
>>>>
>>>>
>>> I mean switching the parsing implementation to streaming from
>>> "parsing at
>>> the start" and not the API. Just that using Reader API over the DOM
>>> API
>>> would help in the switch. Even if we do not switch, the Reader API
>>> offers a
>>> more memory efficient solution than the DOM API.
>>>
>>> Btw, I am not in a favour of switch. You cannot move backwards in
>>> document
>>> that way. I can not fetch a tree by id if I the cursor is ahead of
>>> that
>>> tree. Doing nexml.each_characters and nexml.each_trees is impossible
>>> with
>>> pure streaming. I will have to stream one while cache the other.
>>> Otus and
>>> otu provide a one to many relation with trees and characters, and
>>> rows. An
>>> API call of the type otus.trees or otus.characters or otu.seuences
>>> would be
>>> impossible( not that I have already added the API call ). Imo, NeXML
>>> is
>>> non-linear and not meant to be streamed. Besides other NeXML
>>> implementations
>>> also parse the file at the start.
>>>
>>> --
>>> Anurag Priyam,
>>> 2nd Year Undergraduate,
>>> Department of Mechanical Engineering,
>>> IIT Kharagpur.
>>> +91-9775550642
>> _______________________________________________
>> BioRuby Project - http://www.bioruby.org/
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
More information about the BioRuby
mailing list