[BioRuby] [GSoC][NeXML and RDF API] Code Review.

Hilmar Lapp hlapp at drycafe.net
Sat Jun 26 20:30:19 EDT 2010


Our ability to reconstruct trees of hundreds, thousands, and even tens  
of thousands of characters has improved dramatically over the past  
couple of years, and is increasingly often the goal of an analysis.  
Genome-scale alignments also aren't so rare anymore.

Aside from analysis, NeXML files can be produced by a database, and  
hence could hold large taxonomies, or the tree of life.

NeXML is an emerging standard. If implementations can't cope with the  
large scale data that are becoming increasingly popular, it'll have a  
hard time to get uptake.

	-hilmar

On Jun 25, 2010, at 12:42 AM, Pjotr Prins wrote:

> I think this needs to be answered by Rutger. Are we going to face
> NeXML files in the future that can easily outrun memory?
>
> Pj.
>
> On Fri, Jun 25, 2010 at 01:04:21PM +0530, Anurag Priyam wrote:
>>> How much time would it cost you to stream the data - and what does  
>>> it
>>> mean with regard to changing the API? I guess, in general, NeXML
>>> files won't be that large, so it may not be that important (Rutger)?
>>>
>>> Pj.
>>>
>>>
>> I mean switching the parsing implementation to streaming from  
>> "parsing at
>> the start" and not the API. Just that using Reader API over the DOM  
>> API
>> would help in the switch. Even if we do not switch, the Reader API  
>> offers a
>> more memory efficient solution than the DOM API.
>>
>> Btw, I am not in a favour of switch. You cannot move backwards in  
>> document
>> that way. I can not fetch a tree by id if I the cursor is ahead of  
>> that
>> tree. Doing nexml.each_characters and nexml.each_trees is  
>> impossible with
>> pure streaming. I will have to stream one while cache the other.  
>> Otus and
>> otu provide a one to many relation with trees and characters, and  
>> rows. An
>> API call of the type otus.trees or otus.characters or otu.seuences  
>> would be
>> impossible( not that I have already added the API call ). Imo,  
>> NeXML is
>> non-linear and not meant to be streamed. Besides other NeXML  
>> implementations
>> also parse the file at the start.
>>
>> -- 
>> Anurag Priyam,
>> 2nd Year Undergraduate,
>> Department of Mechanical Engineering,
>> IIT Kharagpur.
>> +91-9775550642
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================






More information about the BioRuby mailing list