[BioSQL-l] Re: load_ncbi_taxonomy
Aaron J Mackey
ajm6q at virginia.edu
Fri May 30 16:33:10 EDT 2003
I've had the same response from Postgres; dropping the constraint checking
seemed to help, but it was still slow (on a beefy database server, it
finishes in around 20 hours, with no other load). I suspect that
maintaining all the indices is probably slow (but to get them back will
still require the time, even if you drop them while loading the data).
Postgres is not a speedy database. It has other nice features, though.
-Aaron
On Fri, 30 May 2003, Hilmar Lapp wrote:
> I'm adding a bit documentation, and among the more important things,
> I'm adding support for Oracle and MySQL and some more verbosity about
> what's going on. I also fixed the problem with the ncbi_taxon_id being
> mysteriously missing in the uploaded data (I suspect the root cause is
> a perl 5.6.0-related bug).
>
> The snag I've hit is with Postgres. It's unbelievably slow on a fresh
> database, when really all it should be doing in the first pass is
> inserting nodes. It starts with ~370 rows/s (this is *after* dropping
> the insert rule on taxon, otherwise it's very slow right away) and
> drops down to ~40 rows/s and 20 rows/s after 60k and 130k rows,
> respectively. Updating the nested set values takes place at 3-5 rows/s
> (!). So, it wouldn't even finish over night.
>
> This is data from my PowerBook G4, 800 Mhz, single disk, 4096 shared
> buffers in Pg, Postgres 7.3.2. Certainly not the most suitable hardware
> configuration, but the rest should be OK. Note that in comparison MySQL
> does ~2000 rows/s during the insert phase and ~500 rows/s during the
> update phase, on the same hardware.
>
> Is anyone else running this script under Postgres? What's the
> performance? If it does finish in a reasonable time, what is your Pg
> configuration, and do you take special measures, like dropping an index
> before or something?
>
> There are still various options left to speed up the whole thing, but
> that must be entirely driver specific then. In particular,
>
> - vacuum analyze after the insert phase
> - drop all indexes on taxon except the primary key during the insert
> phase (many of them are needed though during the update phase)
>
> I'm just wondering whether it's only me and I'm doing something wrong.
> I doubt that I screwed up the code to this effect as I haven't really
> touched the meat of it (SQL statements, code flow, algorithm).
>
> Unless Aaron wants me to, I'm not particularly happy committing the
> updated version with the fixes before I can't get this to the point of
> making at least one full test run on Pg... (apart from MySQL and
> Oracle).
>
> -hilmar
>
--
Aaron J Mackey
Pearson Laboratory
University of Virginia
(434) 924-2821
amackey at virginia.edu
More information about the BioSQL-l
mailing list