[Bioperl-l] tuning load_seqdatabase.db script in bioperl-db

Nicolas Rueff rueff at mediagen.fr
Mon May 26 11:17:48 EDT 2003


I'm using bioperl-db/script/biosql/load_seqdatabase.pl to fill the
biosql schema. The big issue of this script is that the time is takes is
exponential, since for every new sequence, it has to search in the
database if the entry doesn't exists yet. Useful for updates, but not
for first-time fill.

For exemple, I used it with the last full swiss-prot release
(sprot41.dat) to spawn a new fresh database, and if the computer could
handle 100 inserts / sec, it drops to 2/sec near the end of the file.

I think it could be a good idea to add an option like "--forceinsert" to
avoid this problem.

-- 
Nicolas Rueff <rueff at mediagen.fr>

Mediagen SAS
Institut Pasteur de Lille
1, rue du Professeur Calmette
Bâtiment Guérin, 3eme étage, BP 245
59019 LILLE Cedex
Tel +33 3 20 87 72 76
Fax +33 3 20 87 72 82



More information about the Bioperl-l mailing list