[BioSQL-l] *** SPAM *** Re: load_ncbi_taxonomy.pl
Peter
biopython at maubp.freeserve.co.uk
Fri Aug 1 19:24:49 EDT 2008
On Fri, Aug 1, 2008 at 10:04 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> Sounds like I at least managed to silence all the complaining of the script
> ;-) How long did it run? Was it similar to what you've seen earlier or
> outrageously longer?
>
I just ran it again (so updating an already complete database):
$ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root --download true
Downloading NCBI taxon database to taxdata
Unable to close datastream at ./load_ncbi_taxonomy.pl line 726
Loading NCBI taxon database in taxdata:
... retrieving all taxon nodes in the database
... reading in taxon nodes from nodes.dmp
... insert / update / delete taxon nodes
... updating new parent IDs
... (committing nodes)
... rebuilding nested set left/right values
... reading in taxon names from names.dmp
... deleting old taxon names
... inserting new taxon names
... cleaning up
Done.
real 18m29.409s
user 2m28.149s
sys 0m18.025s
Some of that is of course the download time, so without that:
$ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root Loading NCBI taxon database in taxdata:
... retrieving all taxon nodes in the database
... reading in taxon nodes from nodes.dmp
... insert / update / delete taxon nodes
... updating new parent IDs
... (committing nodes)
... rebuilding nested set left/right values
... reading in taxon names from names.dmp
... deleting old taxon names
... inserting new taxon names
... cleaning up
Done.
real 13m18.777s
user 2m17.285s
sys 0m14.821s
This is slow, with plenty of disk activity during the taxon names bit.
However, I haven't got the equivalent numbers from the previous
script to hand (and its after midnight here so I won't re-run it now).
I'd have guessed it used to be about 10 minutes on this machine
though, i.e. it is probably taking longer, but it was already longer
than I liked.
I don't know if that helped, but as I said, I hope to do a more
thorough job later on.
Peter
More information about the BioSQL-l
mailing list