[Bioperl-l] GenBankParser comparison to bioperl parser

Lincoln Stein lstein@cshl.org
Fri, 13 Sep 2002 12:30:38 -0400


I second Aaron's opinion.  It's the *method* calls that hurt.  If you are 
willing to throw away subclassability and do the inner loop parts
using direct function calls, then it's a performance win.

Lincoln

On Thursday 12 September 2002 01:41 pm, Aaron J Mackey wrote:
> [ trimmed the reply-to lines a bit ... ]
>
> On Thu, 12 Sep 2002, Hilmar Lapp wrote:
> > I'm sure that some of the parsing logic can be substantially improved
> > both in readability and speed, but honestly I'd be very surprised if
> > even the ultimately best regexp combined with the ultimately best
> > parsign logic can speed up the whole thing by a factor of more than
> > 2-3 fold. It's the object tree construction that costs you the order
> > of magnitude.
>
> Yes (see pICalculator thread to see a little simple benchmarking on
> SeqIO::fasta vs. pure-perl raw parsing - summary: 24 seconds vs. 0.5
> seconds to read a 25000 sequence protein database).
>
> I don't believe it's object *construction* (i.e. malloc-ing new memory) so
> much as all the function calls that are happening.  Having a pool of
> objects is not going to help this at all (in fact, Perl is already keeping
> pools of SV's around for you to use, so you're just duplicating the effort
> if you go that route).  I repeat: look at the function calls, and all the
> @ISA tree-walking ...
>
> -Aaron

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================