[Bioperl-l] RE: load_seqdatabase.pl running SLOW!

Hilmar Lapp hlapp at gnf.org
Thu Jan 27 20:10:43 EST 2005


Thanks for the update Barry. Great to hear it's working now for you. What you're looking at may really be a Postgres version issue. 7.2 has known problems and after the time 7.3 came out everybody was strongly urged to migrate to 7.3. 
 
All I can say. BTW if there's no package with higher version don't be scared of compiling from scratch. I built Pg on different platforms from Linux to MacOSX and it compiled like a charm on all of them.
 
-hilmar

	-----Original Message----- 
	From: Barry Moore [mailto:barry.moore at genetics.utah.edu] 
	Sent: Thu 1/27/2005 3:22 PM 
	To: Hilmar Lapp 
	Cc: Bioperl list 
	Subject: Re: load_seqdatabase.pl running SLOW!
	
	

	Hilmar-
	
	Thanks for the suggestions.  Things are working smoothly now, but I'm
	not entirely sure why.  I stopped the slow running load_seqdatabase.pl
	process on the fast machine, built an identical biosql database under a
	different name, and began loading the same file into it.  This screamed
	along at 8-10 seq/sec.  I re-ran the load script into the old db - still
	slow.  I vacuumed the old db - still slow.  I dropped the old db and
	rebuilt it - now both load very fast.  I dropped both dbs and rebuilt
	just one, and it is now loading fine.  Go figure.  I send this to the
	list simply for the record in case it provides a clue to someone in the
	future with similar trouble.  I haven't got a clue.
	
	Barry
	
	Hilmar Lapp wrote:
	
	>To be honest I've never loaded a large file into a Pg installation. The problem that I'd expect you to run into is that if you started with a fresh database the lookup queries will become slower and slower in the absence of the stats being recomputed on a frequent basis through vacuum (which the load script won't do).
	>
	>I believe in more recent releases you can actually vacuum the database concurrent to write access; not sure whether 7.2.x will allow this already. You should strongly consider upgrading to at least 7.3 if not 7.4 or even 8.x. The Pg developers may not even answer questions to 7.2 anymore ...
	>
	>Your obvservation that the slower machine with the later kernel would be faster leaves me puzzled. If blind-tested I would have suggested that the machine appearing faster has had the database vacuumed.
	>
	>Not sure this is very helpful ...
	>
	> -hilmar
	>
	>       -----Original Message-----
	>       From: Barry Moore [mailto:barry.moore at genetics.utah.edu]
	>       Sent: Tue 1/25/2005 3:15 PM
	>       To: Bioperl list; Hilmar Lapp
	>       Cc:
	>       Subject: load_seqdatabase.pl running SLOW!
	>      
	>      
	>
	>       Hilmar (or others)-
	>      
	>       I've set up a biosql based database using PostgreSQL 7.2 on a PC with an
	>       Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus.  1 GB of RAM, and
	>       Linux (2.2 kernel - Debian woody distro).  Onto that I am loading
	>       ~352,000 sequences from RefSeq complete rna collection using
	>       load_seqdatabase.pl.  It's running kind of slow - loding on average
	>       about 1 sequence every 2-5 seconds.  In the archives I've read your
	>       comments to a previous question like this suggesting two fast
	>       processors, a couple gigs of memory and 2-3 drives to really make things
	>       fly and while my system isn't that good, it seems like I should be doing
	>       better.  I got to experimenting on another (slower) system while waiting
	>       for things to load, and found that running the same script to load the
	>       same file goes about 3X faster on a 266MHz Intel processor with 192 Mb
	>       RAM.  Same installation of PostgreSQL (both installed from deb package
	>       with defaults), and same installation of Debian Linux (except that the
	>       kernel on the older slow machine has been updated to 2.4)  Another
	>       difference I noticed between the two is that the old 266 MHz machine is
	>       using about 75% CPU resources for perl and about 25% for postmaster
	>       whereas the faster 3 GHz machine (but slower running
	>       load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster
	>       and about 3% for perl.  Both systems are using up most of their memory,
	>       but little to no swap.  Could the kernel upgrade really be making the
	>       difference?  Any thoughts?  As it's going now I can wait over a week for
	>       all these sequences to load, or build the database on our dinosaur
	>       server in a couple of days and dump it across to our sexy new 3 GHz
	>       server.  Talk about bass ackwards!
	>      
	>       Barry
	>      
	>       --
	>       Barry Moore
	>       Dept. of Human Genetics
	>       University of Utah
	>       Salt Lake City, UT
	>      
	>      
	>      
	>      
	>
	> 
	>
	
	--
	Barry Moore
	Dept. of Human Genetics
	University of Utah
	Salt Lake City, UT
	
	
	
	




More information about the Bioperl-l mailing list