[BioRuby] [GSoC][NeXML and RDF API] Code Review.

Rutger Vos rutgeraldo at gmail.com
Fri Jun 25 08:14:44 UTC 2010


This is very possible (and it's why Anurag has been focusing on
stream-based parsing) but I am personally of the opinion that worrying
too much about that right now would be a premature optimization. It
seems to me that we want to get a nice interface that captures what
NeXML can express first, and worry about performance and memory
footprint later - but that's just my own opinion and certainly open
for discussion.

On Fri, Jun 25, 2010 at 8:42 AM, Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
> I think this needs to be answered by Rutger. Are we going to face
> NeXML files in the future that can easily outrun memory?
>
> Pj.
>
> On Fri, Jun 25, 2010 at 01:04:21PM +0530, Anurag Priyam wrote:
>> > How much time would it cost you to stream the data - and what does it
>> > mean with regard to changing the API? I guess, in general, NeXML
>> > files won't be that large, so it may not be that important (Rutger)?
>> >
>> > Pj.
>> >
>> >
>> I mean switching the parsing implementation to streaming from "parsing at
>> the start" and not the API. Just that using Reader API over the DOM API
>> would help in the switch. Even if we do not switch, the Reader API offers a
>> more memory efficient solution than the DOM API.
>>
>> Btw, I am not in a favour of switch. You cannot move backwards in document
>> that way. I can not fetch a tree by id if I the cursor is ahead of that
>> tree. Doing nexml.each_characters and nexml.each_trees is impossible with
>> pure streaming. I will have to stream one while cache the other. Otus and
>> otu provide a one to many relation with trees and characters, and rows. An
>> API call of the type otus.trees or otus.characters or otu.seuences would be
>> impossible( not that I have already added the API call ). Imo, NeXML is
>> non-linear and not meant to be streamed. Besides other NeXML implementations
>> also parse the file at the start.
>>
>> --
>> Anurag Priyam,
>> 2nd Year Undergraduate,
>> Department of Mechanical Engineering,
>> IIT Kharagpur.
>> +91-9775550642
>



-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading
RG6 6BX
United Kingdom
Tel: +44 (0) 118 378 7535
http://www.nexml.org
http://rutgervos.blogspot.com



More information about the BioRuby mailing list