[Bioperl-l] Query Unigene title from input a ACC number / BioPerl Object Creation

Jason Stajich jason at cgt.mc.duke.edu
Tue Mar 25 11:43:21 EST 2003


On Tue, 25 Mar 2003, Jamie Hatfield (AGCoL) wrote:

> Maybe it's just me, but I've never been too pleased with BioPerl's
> ability to handle large amounts of data like these unigene clusters.
> You all might remember I recently proposed a FPC module for reading in
> FPC data files.  Well, that is still in progress, but it is DOG slow,
> and the only reason I can seem to make out of it is that object creation
> is a bear.
>
> I would really like some input myself, from the BioPerl experts about
> what I can do to speed up the creation of say . . . 100k objects?  :-)
>
You have to take a different approach then.  We've gone back and forth on
this a lot wrt to speed and flexibility and a solid object model.
Apparently Perl doesn't make it easy to have all three.

You can get around some of the problems by instead of building things with
new, you bless a hash and then call some methods to push the data in.
This prevents the walk-up-the-tree for inheritance that happens on every
new() call which is the main bottleneck.  We do this with features and
locations in the genbank parser right now to get a modest performance
gain.  It is still an area that we are trying to rethink and improve.

I think we want to also move more in the realm of event based parsing
which would allow you to attach a listener which would only catch certain
events and perhaps wouldn't need to actually create objects for certain
quick and dirty tasks.  But the framework for this needs to be laid pretty
explicitly to make it really work.

I believe Ensembl hit this perf problem and went with a simplier object
initialization scheme to buy them the performance they needed.  It means
that you have to code up more things when you inherit from an object
(and have to remember to update all child classes when every a parent
class changes) but you get some performance increase.

-jason

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list