[Bioperl-l] bioperl-db performance: load_seqdatabase.pl throughput speed

Henry R Bigelow hrb46 at columbia.edu
Tue May 11 13:27:56 EDT 2004


Hi,
	my name is Henry Bigelow and I recently installed bioperl-1.4,
bioperl-db, dbi and dbd-mysql, mysql-4.0 (with InnoDB enabled),
biosql-schema, and instantiated biosqldb-mysql.sql.  i've successfully
loaded some sequences of release43.dat, the swissprot flat file, but the
throughput is roughly 1 sequence every 5 to 10 seconds, on a (admittedly
slow) 400 Mhz 2 CPU Pentium III with 256 Mb memory.  I ran the command:

perl load_seqdatabase.pl --host localhost --dbname bioseqdb --namespace
swissprot --dbuser bigelow --dbpass XXX --driver mysql --format swiss
/data/swissprot/release43.dat


I also ran it (on a set of 15 swissprot entries) with a profiler:

perl -d:DProf load_seqdatabase.pl ...
then with
dprofpp -u
i got this:

%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 9.62   0.800  0.985  15282   0.0001 0.0001
Bio::DB::Persistent::PersistentObject::isa
 9.54   0.793  1.403  11909   0.0001 0.0001
Bio::DB::Persistent::PersistentObject::AUTOLOAD
 9.25   0.769  3.152   8888   0.0001 0.0004
Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent
 4.69   0.390  2.922   7733   0.0001 0.0004
Bio::DB::BioSQL::BasePersistentAdaptor::_process_child
 4.59   0.382  0.382  26865   0.0000 0.0000
Bio::DB::Persistent::PersistentObject::obj
 3.84   0.319  0.319  32822   0.0000 0.0000  UNIVERSAL::isa
 3.69   0.307  0.372     86   0.0036 0.0043
Bio::DB::BioSQL::ReferenceAdaptor::_crc64
 3.28   0.273  1.195    258   0.0011 0.0046  Bio::Root::Root::_load_module
 2.80   0.233  3.545   5465   0.0000 0.0006
Bio::DB::BioSQL::BasePersistenceAdaptor::create_persistent
 2.74   0.228  0.228    291   0.0008 0.0008  Bio::Root::RootI::stack_trace
 1.92   0.160  0.160   1794   0.0001 0.0001  DBI::st::execute
 1.84   0.153  0.534   1608   0.0001 0.0003
Bio::DB::Persistent::PersistentObject::new
 1.80   0.150  0.150   7215   0.0000 0.0000
Bio::DB::Persistent::PersistentObject::primary_key
 1.74   0.145  0.185   2640   0.0001 0.0001  Bio::Root::Root::new
 1.71   0.142  1.078    474   0.0003 0.0023
Bio::DB::BioSQL::BaseDriver::insert_object

i do realize that these perl objects are large, but it still seems quite
slow.  (i'm not even sure whether the profiler demonstrates that the
majority of time is spent instantiating perl objects as opposed to running
mysql commands.)

all bioperl-db, bioperl, dbi and dbd-mysql tests came out ok (the vast
majority of them anyway).

incidentally, it took me a week of getting errors during
load_seqdatabase.pl loading, before i discovered the true cause:  that
a perl executable with threading enabled does NOT work with this.  (The
author of dbd-mysql or dbi warns about this, but i didn't heed the warning
at first).


if anyone has any ideas about what might be making it slow, please let me
know!  i'd greatly appreciate it.

Sincerely,

Henry Bigelow



More information about the Bioperl-l mailing list