[Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl / swissprot

Hilmar Lapp hlapp at gmx.net
Sat Mar 22 20:01:51 UTC 2008


Forgot to respond to this:

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> It took two hours to load 26504 records (7%) of uniprot_sprot.dat  
> (is it expected to be so slow?)


The last time I used to load those regularly it was a bit faster (~ 5  
seqs/s) but it is in a ballpark that wouldn't raise a red flag for me.

BTW you can make it print statistics using the --logchunk N option,  
where N is the number of seqs after which you want the current count  
and the #recs/s printed.

You may get it to be faster if you tune the database (e.g., make sure  
there is enough memory for index reorganization, transaction log and  
tablespace datafile are on separate disks, etc; fiddling with the  
query optimizer has probably little effect as almost all queries are  
simple lookups or inserts).

That all said, the strength of load_seqdatabase.pl isn't speed. It  
doesn't make use of any bulk upload optimizations, and therefore the  
initial load of a very large database will take its time. The power  
is more in subsequent updates where you can configure what you want  
to happen, and during which the database is never in an inconsistent  
state, so it can run in the background.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






More information about the Bioperl-l mailing list