[BioSQL-l] load_ncbi_taxonomy.pl
Peter
biopython at maubp.freeserve.co.uk
Sat Aug 2 08:30:46 EDT 2008
On Sat, Aug 2, 2008 at 1:15 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> These sound like reasonable times, depending on your machine configuration.
> I suspect that PostgreSQL might even be a bit faster, as that's a similar
> time to what I'm observing on my laptop.
>
> BTW if you provide --verbose=2 on the command line you'll get rows/time
> statistics. The slowest steps (recomputing nested set values, and inserting
> taxon names) average between 900-1800 rows/s on my laptop, depending on what
> else is going on (I suspect the spotlight indexer to contend for the disk
> drive on occasion). The faster steps (e.g. inserting taxon nodes) I observe
> at up to 2500-4000 rows/s.
I'm seeing about 900 rows/s on the recomputing of the nested set
values, which means my 2 year old desktop is slower than your laptop.
This is an AMD Athlon 64 X2 4600+ Socket 939 dual core machine, with a
Seagate Barracuda hard drive (7200rpm, 200GB, 8MB Cache, IDE Ultra
ATA100), running Ubuntu Dapper Drake (due for an upgrade soon!).
$ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root --verbose=2
Loading NCBI taxon database in taxdata:
... retrieving all taxon nodes in the database
... reading in taxon nodes from nodes.dmp
... insert / update / delete taxon nodes
20000/448630 done (in 0 secs, 20000.0 rows/s)
40000/448630 done (in 1 secs, 20000.0 rows/s)
60000/448630 done (in 0 secs, 20000.0 rows/s)
80000/448630 done (in 0 secs, 20000.0 rows/s)
100000/448630 done (in 0 secs, 20000.0 rows/s)
120000/448630 done (in 0 secs, 20000.0 rows/s)
140000/448630 done (in 1 secs, 20000.0 rows/s)
160000/448630 done (in 0 secs, 20000.0 rows/s)
180000/448630 done (in 0 secs, 20000.0 rows/s)
200000/448630 done (in 0 secs, 20000.0 rows/s)
220000/448630 done (in 0 secs, 20000.0 rows/s)
240000/448630 done (in 1 secs, 20000.0 rows/s)
260000/448630 done (in 0 secs, 20000.0 rows/s)
280000/448630 done (in 0 secs, 20000.0 rows/s)
300000/448630 done (in 0 secs, 20000.0 rows/s)
320000/448630 done (in 0 secs, 20000.0 rows/s)
340000/448630 done (in 1 secs, 20000.0 rows/s)
360000/448630 done (in 0 secs, 20000.0 rows/s)
380000/448630 done (in 0 secs, 20000.0 rows/s)
400000/448630 done (in 0 secs, 20000.0 rows/s)
420000/448630 done (in 0 secs, 20000.0 rows/s)
440000/448630 done (in 1 secs, 20000.0 rows/s)
... updating new parent IDs
... (committing nodes)
... rebuilding nested set left/right values
20000 done (in 22 secs, 909.1 rows/s)
40000 done (in 22 secs, 909.1 rows/s)
60000 done (in 23 secs, 869.6 rows/s)
80000 done (in 22 secs, 909.1 rows/s)
100000 done (in 22 secs, 909.1 rows/s)
120000 done (in 22 secs, 909.1 rows/s)
140000 done (in 22 secs, 909.1 rows/s)
160000 done (in 22 secs, 909.1 rows/s)
180000 done (in 22 secs, 909.1 rows/s)
200000 done (in 21 secs, 952.4 rows/s)
220000 done (in 21 secs, 952.4 rows/s)
240000 done (in 22 secs, 909.1 rows/s)
260000 done (in 22 secs, 909.1 rows/s)
280000 done (in 21 secs, 952.4 rows/s)
300000 done (in 22 secs, 909.1 rows/s)
320000 done (in 21 secs, 952.4 rows/s)
340000 done (in 22 secs, 909.1 rows/s)
360001 done (in 22 secs, 909.1 rows/s)
380001 done (in 22 secs, 909.1 rows/s)
400001 done (in 21 secs, 952.4 rows/s)
420001 done (in 22 secs, 909.1 rows/s)
440001 done (in 21 secs, 952.4 rows/s)
... reading in taxon names from names.dmp
... deleting old taxon names
... inserting new taxon names
20000 done (in 3 secs, 6666.7 rows/s)
40000 done (in 2 secs, 10000.0 rows/s)
60000 done (in 4 secs, 5000.0 rows/s)
80000 done (in 3 secs, 6666.7 rows/s)
100000 done (in 5 secs, 4000.0 rows/s)
120000 done (in 6 secs, 3333.3 rows/s)
140000 done (in 7 secs, 2857.1 rows/s)
160000 done (in 7 secs, 2857.1 rows/s)
180000 done (in 8 secs, 2500.0 rows/s)
200000 done (in 8 secs, 2500.0 rows/s)
220000 done (in 8 secs, 2500.0 rows/s)
240000 done (in 9 secs, 2222.2 rows/s)
260000 done (in 9 secs, 2222.2 rows/s)
280000 done (in 10 secs, 2000.0 rows/s)
300000 done (in 10 secs, 2000.0 rows/s)
320000 done (in 10 secs, 2000.0 rows/s)
340000 done (in 10 secs, 2000.0 rows/s)
360000 done (in 10 secs, 2000.0 rows/s)
380000 done (in 10 secs, 2000.0 rows/s)
400000 done (in 11 secs, 1818.2 rows/s)
420000 done (in 11 secs, 1818.2 rows/s)
440000 done (in 11 secs, 1818.2 rows/s)
460000 done (in 10 secs, 2000.0 rows/s)
480000 done (in 10 secs, 2000.0 rows/s)
500000 done (in 11 secs, 1818.2 rows/s)
520000 done (in 11 secs, 1818.2 rows/s)
540000 done (in 12 secs, 1666.7 rows/s)
560000 done (in 10 secs, 2000.0 rows/s)
580000 done (in 12 secs, 1666.7 rows/s)
600000 done (in 12 secs, 1666.7 rows/s)
620000 done (in 11 secs, 1818.2 rows/s)
... cleaning up
Done.
real 13m13.805s
user 2m3.548s
sys 0m13.781s
>
> Thanks for all the testing, it's much appreciated!
>
This is only very cursory, confirming the script runs without showing
any error messages, but its better than no testing ;)
Peter
More information about the BioSQL-l
mailing list