[BioSQL-l] BioSQL conflict with swissprot and NCBI

Raphael Bauer rbauer at informatik.hu-berlin.de
Mon Nov 10 11:30:17 EST 2003


	taxonomy

> i've got some trouble parsing the ncbi taxonomy into an existing biosql
> schmema populated with swissport.
...cut out....
>
> ...in my opinion it is due to the fact that swissprot has some kind of
> taxonomy in it's OC lines that are a part of the NCBI taxonomy. (parsed
> already in table term)
>
> So my question is if there is a way to integrate swissprot and ncbi in
> one biosql schema.
> Or if it is better to keep NCBI and swissprot seperated in own biosql
> schemas and map them together lateron to get a mapping from ncbi and
> swissprot...
>
> > So my question is if there is a way to integrate swissprot and ncbi in
> > one biosql schema.
>
> Absolutely, but in the opposite order than you did. The problem with loading
> swissprot first is that then you get about 6000-7000 taxa with unreliable
> (against the NCBI taxonomy as the standard) and/or incomplete lineages.
>
> First load the NCBI taxonomy database, only then a sequence database. Which
> BTW should also rid you of some errors you will have seen when you loaded
> swissprot.
>

Hi Hilmar,
thanks for the fast reply.
I just tried it the other way round (First NCBI
then Swissprot) but the problem still remains...
... I tried also parsing Swissprot with load_seqdatabase with --lookup and
without -- lookup, but it makes no difference... (that's to some point
clear for me as well)..
...
My command lines and the error message:
NCBI:
----
perl load_ncbi_taxonomy.pl --dbname  NCBIdannSprot --driver Pg --host
localhost --dbuser biosql --download --directory ~/wbi/.
...works fine

Swissprot with lookup:
----------------------
perl load_seqdatabase.pl --lookup --host localhost --dbuser biosql
--dbname NCBIdannSprot_mitlookup --namespace swissprot --driver Pg
--format swiss /local/sprot_weekly.dat
Loading /local/sprot_weekly.dat ...
DBD::Pg::st execute failed: ERROR:  Cannot insert a duplicate key into
unique index taxon_pkey at
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Pg/SpeciesAdaptorDriver.pm
line 356, <GEN0> line 385883.
Could not store O18759:
------------- EXCEPTION  -------------
MSG: create: object (Bio::Species) failed to insert or to be found by
unique key
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207
STACK Bio::DB::Persistent::PersistentObject::create
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:243
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:170
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253
STACK Bio::DB::Persistent::PersistentObject::store
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270
STACK (eval) load_seqdatabase.pl:446
STACK toplevel load_seqdatabase.pl:429

--------------------------------------

...perhaps there is something wrong in my command line options, but i
can't see it...

Thanks for your help,

Raphael Bauer




More information about the BioSQL-l mailing list