[Bioperl-l] Re: bioperl-db

Hilmar Lapp hlapp at gnf.org
Tue Jul 20 19:56:50 EDT 2004


If (and it seems you are) you're using a version of bioperl-db updated  
some time in I believe May or so (shame on me for not tagging...) then  
the namespace if provided will always be included in the lookup.

This is to support the inclusion of namespace in the two alternative  
key definitions on bioentry, i.e., (accession,version,namespace) and  
(identifier,namespace). The schema DDL definition as in CVS defines  
identifier as unique by itself, but given earlier feedback this is  
likely to be changed. At any rate, you may choose either definition of  
the unique key.

So, most likely what you want is to change the unique key definition of  
identifier to include biodatabase_id. If you don't know how to do that  
I can send you the SQL code.

	-hilmar

On Jul 20, 2004, at 1:07 PM, Mike Muratet wrote:

> Hilmar
>
> While you're on the subject of load_seqdatabase, I am experiencing a  
> wierd
> problem I need some help solving.
>
> I am loading subsets of Genbank into the database I created with the
> script create_mysql_db using the script load_seqdatabase.pl all of  
> which I
> downloaded from the links at bioperl.org. (Having these records in  
> mysql
> saves me a _ton_ of perl writing and thank you to the folks who did the
> development.) Sometimes there is overlap in the sets. For example.....
>
> /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/ 
> load_seqdatabase.pl --dbname bioseqdb --format GenBank
> --namespace clones --lookup --noupdate accessions.gb
> Loading accessions.gb ...
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were
> ("BC029727","20987556","BC029727","Mus musculus zeta-chain (TCR)
> associated protein kinase, mRNA (cDNA clone MGC:36162 IMAGE:4925739),
> complete cds.","1","ROD") FKs (5,10090)
> Duplicate entry '20987556' for key 3
> ---------------------------------------------------
> Could not store BC029727:
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Seq::RichSeq) failed to insert or to be  
> found by
> unique key
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:207
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:253
> STACK Bio::DB::Persistent::PersistentObject::store
> /usr/local/lib/perl5/site_perl/5.8.4/Bio/DB/Persistent/ 
> PersistentObject.pm:270
> STACK (eval)
> /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/ 
> load_seqdatabase.pl:517
> STACK toplevel
> /usr/local/lib/perl5/site_perl/5.8.4/bioperl-db/scripts/biosql/ 
> load_seqdatabase.pl:500
>
> The offending key is the GI number, which gets stored (apparently) in  
> the
> identifier column of bioentry. I would expect that the namespace would
> confer uniqueness. Further, I might expect that an entry under a new
> namespace might supercede the previous reference. However, I am  
> baffled by
> the exception when I have set --lookup and --noupdate. I have looked at
> the code, and I don't see anything simple. Could it be that the lookup
> includes the namespace in the key but the store does not? Am I using it
> improperly?
>
> Thanks
>
> Mike
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list