[BioSQL-l] BioSQL conflict with swissprot and NCBI

Hilmar Lapp hlapp at gnf.org
Mon Nov 10 12:38:02 EST 2003


Some swissprot records still won't parse properly b/c of species  
parsing problems. Try to run the load_seqdatabase.pl with --safe  
(that's always a good idea anyway unless you want to immediately get  
thrown out upon the first trouble maker), then see what the accession  
numbers of those records are. A complete parse of swissprot and trembl  
should give you a count of failures that should be in the low double  
digits (out of a total of more than 1 million).

	-hilmar

On Monday, November 10, 2003, at 08:30  AM, Raphael Bauer wrote:

> 	taxonomy
>
>> i've got some trouble parsing the ncbi taxonomy into an existing  
>> biosql
>> schmema populated with swissport.
> ...cut out....
>>
>> ...in my opinion it is due to the fact that swissprot has some kind of
>> taxonomy in it's OC lines that are a part of the NCBI taxonomy.  
>> (parsed
>> already in table term)
>>
>> So my question is if there is a way to integrate swissprot and ncbi in
>> one biosql schema.
>> Or if it is better to keep NCBI and swissprot seperated in own biosql
>> schemas and map them together lateron to get a mapping from ncbi and
>> swissprot...
>>
>>> So my question is if there is a way to integrate swissprot and ncbi  
>>> in
>>> one biosql schema.
>>
>> Absolutely, but in the opposite order than you did. The problem with  
>> loading
>> swissprot first is that then you get about 6000-7000 taxa with  
>> unreliable
>> (against the NCBI taxonomy as the standard) and/or incomplete  
>> lineages.
>>
>> First load the NCBI taxonomy database, only then a sequence database.  
>> Which
>> BTW should also rid you of some errors you will have seen when you  
>> loaded
>> swissprot.
>>
>
> Hi Hilmar,
> thanks for the fast reply.
> I just tried it the other way round (First NCBI
> then Swissprot) but the problem still remains...
> ... I tried also parsing Swissprot with load_seqdatabase with --lookup  
> and
> without -- lookup, but it makes no difference... (that's to some point
> clear for me as well)..
> ...
> My command lines and the error message:
> NCBI:
> ----
> perl load_ncbi_taxonomy.pl --dbname  NCBIdannSprot --driver Pg --host
> localhost --dbuser biosql --download --directory ~/wbi/.
> ...works fine
>
> Swissprot with lookup:
> ----------------------
> perl load_seqdatabase.pl --lookup --host localhost --dbuser biosql
> --dbname NCBIdannSprot_mitlookup --namespace swissprot --driver Pg
> --format swiss /local/sprot_weekly.dat
> Loading /local/sprot_weekly.dat ...
> DBD::Pg::st execute failed: ERROR:  Cannot insert a duplicate key into
> unique index taxon_pkey at
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Pg/SpeciesAdaptorDriver.pm
> line 356, <GEN0> line 385883.
> Could not store O18759:
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Species) failed to insert or to be found by
> unique key
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:207
> STACK Bio::DB::Persistent::PersistentObject::create
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ 
> PersistentObject.pm:243
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:170
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:253
> STACK Bio::DB::Persistent::PersistentObject::store
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ 
> PersistentObject.pm:270
> STACK (eval) load_seqdatabase.pl:446
> STACK toplevel load_seqdatabase.pl:429
>
> --------------------------------------
>
> ...perhaps there is something wrong in my command line options, but i
> can't see it...
>
> Thanks for your help,
>
> Raphael Bauer
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the BioSQL-l mailing list