[Bioperl-l] SearchIO speed up

Thu Aug 10 19:32:43 UTC 2006

Sendu Bala wrote:
> aaron.j.mackey at gsk.com wrote:
>>> ...Except I need to know if the community considers the speed problem 
>>> solved or not. More radical changes will make SearchIO even faster, 
>>> eg. Chris Fields and Jason (if I interpret the Project priority list 
>>> item correctly) have suggested an end to individual Hit and HSP 
>>> objects, which become just data members of a Result-like object. 
>>> Ideally I don't want to go down that route because we lose quite a 
>>> bit of OO power;
>>
>> As already mentioned, a lazy-evaluation approach would also work.
>>
>> Jason and I did once talk about an entirely new 
>> parsing/object-building framework, based on nested grammars; in 
>> essence, the "top-level" parser, simply "chunks" the input into blobs 
>> of (minimally parsed) text that correspond to the top level result 
>> object.

[...]

> Or are you suggesting something that would be even better than that? If 
> so, please elucidate! :)

Oh, I guess the difference is the 'minimally parsed' bit, ie. the hsp 
chunks could virtually be raw lines from the input file? I don't think 
parsing the lines into data stored in hashes is any kind of significant 
burden, but it is certainly worthy of investigation if we're really 
really hungry for speed. Remember that anyway, we have to do a 
significant amount of parsing to discover where the chunks start and end.

... Though, with that approach we might also get a memory saving: 
assuming we can rely on the input file sticking around, store a pointer 
to the position and length of each 'chunk' of lines, instead of the line 
data itself.

(I don't think that's a serious suggestion, just throwing ideas out.)