[BioSQL-l] *** SPAM *** Re: load_ncbi_taxonomy.pl

Peter biopython at maubp.freeserve.co.uk
Fri Aug 1 19:24:49 EDT 2008


On Fri, Aug 1, 2008 at 10:04 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> Sounds like I at least managed to silence all the complaining of the script
> ;-) How long did it run? Was it similar to what you've seen earlier or
> outrageously longer?
>

I just ran it again (so updating an already complete database):

$ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root --download true
Downloading NCBI taxon database to taxdata
Unable to close datastream at ./load_ncbi_taxonomy.pl line 726
Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
        ... updating new parent IDs
        ... (committing nodes)
        ... rebuilding nested set left/right values
        ... reading in taxon names from names.dmp
        ... deleting old taxon names
        ... inserting new taxon names
        ... cleaning up
Done.

real    18m29.409s
user    2m28.149s
sys     0m18.025s

Some of that is of course the download time, so without that:

$ time perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
        ... updating new parent IDs
        ... (committing nodes)
        ... rebuilding nested set left/right values
        ... reading in taxon names from names.dmp
        ... deleting old taxon names
        ... inserting new taxon names
        ... cleaning up
Done.

real    13m18.777s
user    2m17.285s
sys     0m14.821s

This is slow, with plenty of disk activity during the taxon names bit.
 However, I haven't got the equivalent numbers from the previous
script to hand (and its after midnight here so I won't re-run it now).
 I'd have guessed it used to be about 10 minutes on this machine
though, i.e. it is probably taking longer, but it was already longer
than I liked.

I don't know if that helped, but as I said, I hope to do a more
thorough job later on.

Peter


More information about the BioSQL-l mailing list