[BioSQL-l] load_ncbi_taxonomy.pl

Peter biopython at maubp.freeserve.co.uk
Fri Aug 1 16:58:14 EDT 2008


>> By testing I meant primarily if people use other platforms that I do
>> (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this
>> a whirl as in, load the NCBI taxonomy into a scratch database (using the
>> script), then load it again (simulating an update), and see whether there
>> are any error or warning messages that'd be great.
>
> OK, as a very cursory check I did a quick test on a Linux machine
> using MySQL.  I just grabbed the latest script via the SVN webpage,
> then using an existing (partly populated) database:
>
> $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
> --dbuser root --download true
> Downloading NCBI taxon database to taxdata
> Unable to close datastream at ./load_ncbi_taxonomy.pl line 726
>
> This may be a network issue... the taxdata/taxdump.tar.gz file had
> downloaded OK, so I manually unzipped it, and then:
>
> $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
> --dbuser root Loading NCBI taxon database in taxdata:
>        ... retrieving all taxon nodes in the database
>        ... reading in taxon nodes from nodes.dmp
>        ... insert / update / delete taxon nodes
>        ... updating new parent IDs
>        ... (committing nodes)
>        ... rebuilding nested set left/right values
>        ... reading in taxon names from names.dmp
>        ... deleting old taxon names
>        ... inserting new taxon names
>        ... cleaning up
> Done.
>
> So no further error messages - however, I have not actually checked to
> see what exactly this did to my database ;)

I then simulated an update by deleting the downloaded taxdata, and
rerunning the script:

$ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root --download true
Downloading NCBI taxon database to taxdata
Unable to close datastream at ./load_ncbi_taxonomy.pl line 726
Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
        ... updating new parent IDs
        ... (committing nodes)
        ... rebuilding nested set left/right values
        ... reading in taxon names from names.dmp
        ... deleting old taxon names
        ... inserting new taxon names
        ... cleaning up
Done.

[Note that after the "unable to close" message I just left the script
running this time, and it continued fine]

Again, I haven't checked the database.

Peter


More information about the BioSQL-l mailing list