[Bioperl-l] GenBankParser comparison to bioperl parser

Aaron J Mackey Aaron J. Mackey" <amackey@virginia.edu
Thu, 12 Sep 2002 13:41:14 -0400 (EDT)


[ trimmed the reply-to lines a bit ... ]

On Thu, 12 Sep 2002, Hilmar Lapp wrote:

> I'm sure that some of the parsing logic can be substantially improved
> both in readability and speed, but honestly I'd be very surprised if
> even the ultimately best regexp combined with the ultimately best
> parsign logic can speed up the whole thing by a factor of more than
> 2-3 fold. It's the object tree construction that costs you the order
> of magnitude.

Yes (see pICalculator thread to see a little simple benchmarking on
SeqIO::fasta vs. pure-perl raw parsing - summary: 24 seconds vs. 0.5
seconds to read a 25000 sequence protein database).

I don't believe it's object *construction* (i.e. malloc-ing new memory) so
much as all the function calls that are happening.  Having a pool of
objects is not going to help this at all (in fact, Perl is already keeping
pools of SV's around for you to use, so you're just duplicating the effort
if you go that route).  I repeat: look at the function calls, and all the
@ISA tree-walking ...

-Aaron


-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey@virginia.edu