From rEyez at web.de Wed Nov 5 06:16:32 2003 From: rEyez at web.de (Raphael A. Bauer) Date: Wed Nov 5 05:13:01 2003 Subject: [BioSQL-l] BioSQL conflict with swissprot and NCBI taxonomy Message-ID: <3FA8DC10.4080408@web.de> Hello, i've got some trouble parsing the ncbi tyxonomy into an existing biosql schmema populated with swissport. To parse the NCBI i used from /scripts/ in the biosql schema: perl load_ncbi_taxonomy.pl --dbname columbaTax --driver Pg --host localhost --dbuser biosql --download --directory ~/wbi/. Loading NCBI taxon database in /vol/fob-vol3/mi02/rbauer/wbi/.: and to parse the swissprot i used from bioperl-db: perl load_seqdatabase.pl --host localhost --dbuser rbauer --dbname bioseqdb --namespace swissprot --driver Pg--format swiss /SwissProtDB/sprot41.dat the error message is the following: Loading /local/sprot_weekly.dat ... DBD::Pg::st execute failed: ERROR: Cannot insert a duplicate key into unique in dex taxon_pkey at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Pg/SpeciesAdaptor Driver.pm line 356, line 385883. Could not store O18759: ------------- EXCEPTION ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5 .8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/site_perl/5.8 .0/Bio/DB/Persistent/PersistentObject.pm:243 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5 .8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:170 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5. 8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8. 0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) load_seqdatabase.pl:446 STACK toplevel load_seqdatabase.pl:429 ...in my opinion it is due to the fact that swissprot has some kind of taxonomy in it's OC lines that are a part of the NCBI taxonomy. (parsed already in table term) So my question is if there is a way to integrate swissprot and ncbi in one biosql schema. Or if it is better to keep NCBI and swissprot seperated in own biosql schemas and map them together lateron to get a mapping from ncbi and swissprot... Many thanks in advance, Raphael Bauer From hlapp at gnf.org Wed Nov 5 14:00:06 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Nov 5 13:56:44 2003 Subject: [BioSQL-l] BioSQL conflict with swissprot and NCBI taxonomy In-Reply-To: <3FA8DC10.4080408@web.de> Message-ID: On 11/5/03 3:16 AM, "Raphael A. Bauer" wrote: > So my question is if there is a way to integrate swissprot and ncbi in > one biosql schema. Absolutely, but in the opposite order than you did. The problem with loading swissprot first is that then you get about 6000-7000 taxa with unreliable (against the NCBI taxonomy as the standard) and/or incomplete lineages. First load the NCBI taxonomy database, only then a sequence database. Which BTW should also rid you of some errors you will have seen when you loaded swissprot. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From m_conte at hotmail.com Thu Nov 6 04:26:28 2003 From: m_conte at hotmail.com (matthieu CONTE) Date: Thu Nov 6 04:23:08 2003 Subject: [BioSQL-l] (no subject) Message-ID: Hello, We just transfered Arabidospsis Thaliania?s EMBL sequences via SwissProt in BioSQL format. We are able to take out sequences and annotations (using a Bio::DB::PersitenceAdaptorI and the ?find_by_unique_key? method) from a BioentryID. But from an entry, we are unable to take out the access number Pfam, Prosite and others... from the crossreferences table (BioentryDbxref). Could you please tell us wich adaptor can we use and how to do it ? In short, how can we use the Bio ::DB:: ...adaptor to access to all the tables in a BioSQLformat?!! Many thanks in advance, M Matthieu CONTE 23 route d'EUS 66500 CATLLAR Tel 0468962854 m_conte@hotmail.com _________________________________________________________________ MSN Search, le moteur de recherche qui pense comme vous ! http://search.msn.fr/worldwide.asp From hlapp at gnf.org Thu Nov 6 17:48:44 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Nov 6 17:45:24 2003 Subject: [BioSQL-l] (no subject) In-Reply-To: Message-ID: On 11/6/03 1:26 AM, "matthieu CONTE" wrote: > Hello, > > We just transfered Arabidospsis Thaliania?s EMBL sequences via SwissProt in > BioSQL format. > We are able to take out sequences and annotations (using a > Bio::DB::PersitenceAdaptorI and the ?find_by_unique_key? method) from a > BioentryID. > But from an entry, we are unable to take out the access number Pfam, Prosite > and others... from the crossreferences table (BioentryDbxref). > > Could you please tell us wich adaptor can we use and how to do it ? > > In short, how can we use the Bio ::DB:: ...adaptor to access to all the > tables in a BioSQLformat?!! > Generally speaking, you use adaptors to pull objects out or get objects into the database, the idea being you don?t have to care a lot about which tables are involved. Since you don't exactly explain what you're trying to do, there's multiple answers and I'll give you three, hoping to hit a match with at least one. A) If all you know is the database and accession of the dbxref, you can pull it out by a unique key query: $dbxref = Bio::Annotation::DBLink->new(-dbname => 'Genbank', -primary_id => 'BC6256426'); $adp = $db->get_persistence_adaptor($dbxref); $found = $adp->find_by_unique_key($dbxref); B) If you want to pull out all dbxrefs of a sequence entry, and you have the seq object in hand, all annotation will have been loaded. Hence, $seq = < ... e.g., find by unique key ...> @dblinks = $seq->annotation->get_Annotations('dblink'); Will do the job. C) Same as B) but you don't have the seq object and you don't want it either. $query = Bio::DB::Query::BioQuery->new( -datacollections => ["Bio::SeqI s", "Bio::Annotation::DBLink dbx", "Bio::SeqI<=>Bio::Annotation::DBLink"], -where => ["s.accession_number = 'BC236452'"]); $adp = $db->get_persistence_adaptor("Bio::Annotation::DBLink"); $result = $adp->find_by_query($query); while(my $dbx = $result->next_object()) { # do something with $dbx } I may have forgotten a parameter or so, especially for C), check out the documentation in Bio::DB::PersistenceAdaptorI (and possibly also Bio::DB::BioSQL::BasePersistenceAdaptor, the base class for all implementors). Hth, -hilmar > Many thanks in advance, > > M > > > > > Matthieu CONTE > 23 route d'EUS > 66500 CATLLAR > Tel > 0468962854 > m_conte@hotmail.com > > _________________________________________________________________ > MSN Search, le moteur de recherche qui pense comme vous ! > http://search.msn.fr/worldwide.asp > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Marc.Logghe at devgen.com Sat Nov 8 18:13:12 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Sat Nov 8 18:10:02 2003 Subject: [BioSQL-l] query problem Message-ID: Hi, I'm having some problems in creating a correct query. What I am trying to do is fetching a persistent feature object with primary_tag 'source' of a seq object with accesssion 'ABC' belonging to namespace 'test'. I have tried this using the following code snippet: my $adp = $db->get_object_adaptor('Bio::SeqFeatureI'); $adp->verbose(1); my $query = Bio::DB::Query::BioQuery->new(); my $mapper = Bio::DB::BioSQL::Oracle::BasePersistenceAdaptorDriver->new(); $query->datacollections( [ "Bio::SeqFeatureI f", "Bio::SeqI=>Bio::SeqFeatureI s", # "Bio::Ontology::TermI=>Bio::SeqFeatureI t", "BioNamespace=>Bio::PrimarySeqI db", ] ); $query->where( [ "db.namespace = 'test'", "f.primary_tag = 'source'", "s.accession_number = 'ABC'"] ); This did not work as I expected however. Seems like f.primary_tag is mapped to type_trm_id, so I had to pass it the value 11 instead. So, I guessed "Bio::Ontology::TermI=>Bio::SeqFeatureI t" had to be added to the datacollections in order to get to the value 'source' with the where clause: $query->where( [ "db.namespace = 'test'", "t.name = 'source'", "s.accession_number = 'ABC'"] ); This did not work neither, because the generated sql was not OK: SELECT f.oid, f.display_name, f.rank, f.ent_oid, f.type_trm_oid, f.source_trm_oid FROM seqfeature f, bioentry s, term t, biodatabase db WHERE f.ent_oid = s.oid AND f.trm_oid = t.oid AND s.db_oid = db.oid AND (db.name = 'test' AND t.name = 'source' AND s.accession = 'ABC')'']) As far as I can see 'f.trm_oid = t.oid' should read 'f.type_trm_oid = t.oid' ? I am pretty new to the API and schema, so I guess it is more probable that I am missing something here. Regards, Marc From Marc.Logghe at devgen.com Sat Nov 8 19:18:06 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Sat Nov 8 19:14:44 2003 Subject: [BioSQL-l] query problem Message-ID: I have another question related to my previous post. The returned persistent feature objects seem not to know about their parent; I mean, the display_name is undef. What should I change to the query in order to fill that slot ? Many thanks in advance, Marc > -----Original Message----- > From: Marc Logghe > Sent: Sunday, November 09, 2003 12:13 AM > To: OBDA BioSQL (E-mail) > Subject: [BioSQL-l] query problem > > > Hi, > I'm having some problems in creating a correct query. What I > am trying to do is fetching a persistent feature object with > primary_tag 'source' of a seq object with accesssion 'ABC' > belonging to namespace 'test'. I have tried this using the > following code snippet: > my $adp = $db->get_object_adaptor('Bio::SeqFeatureI'); > $adp->verbose(1); > my $query = Bio::DB::Query::BioQuery->new(); > my $mapper = > Bio::DB::BioSQL::Oracle::BasePersistenceAdaptorDriver->new(); > > $query->datacollections( > [ > "Bio::SeqFeatureI f", > "Bio::SeqI=>Bio::SeqFeatureI s", > # > "Bio::Ontology::TermI=>Bio::SeqFeatureI t", > "BioNamespace=>Bio::PrimarySeqI db", > ] > ); > > $query->where( [ "db.namespace = 'test'", "f.primary_tag = > 'source'", "s.accession_number = 'ABC'"] ); > > This did not work as I expected however. Seems like > f.primary_tag is mapped to type_trm_id, so I had to pass it > the value 11 instead. > So, I guessed "Bio::Ontology::TermI=>Bio::SeqFeatureI t" had > to be added to the datacollections in order to get to the > value 'source' with the where clause: > $query->where( [ "db.namespace = 'test'", "t.name = > 'source'", "s.accession_number = 'ABC'"] ); > This did not work neither, because the generated sql was not OK: > SELECT f.oid, f.display_name, f.rank, f.ent_oid, > f.type_trm_oid, f.source_trm_oid FROM seqfeature f, bioentry > s, term t, biodatabase db WHERE f.ent_oid = s.oid AND > f.trm_oid = t.oid AND s.db_oid = db.oid AND (db.name = 'test' > AND t.name = 'source' AND s.accession = 'ABC')'']) > As far as I can see 'f.trm_oid = t.oid' should read > 'f.type_trm_oid = t.oid' ? > I am pretty new to the API and schema, so I guess it is more > probable that I am missing something here. > > Regards, > Marc > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From hlapp at gnf.org Sun Nov 9 19:00:57 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Sun Nov 9 18:57:44 2003 Subject: [BioSQL-l] query problem In-Reply-To: Message-ID: On Saturday, November 8, 2003, at 03:13 PM, Marc Logghe wrote: > Hi, > I'm having some problems in creating a correct query. What I am trying > to do is fetching a persistent feature object with primary_tag > 'source' of a seq object with accesssion 'ABC' belonging to namespace > 'test'. I have tried this using the following code snippet: > my $adp = $db->get_object_adaptor('Bio::SeqFeatureI'); > $adp->verbose(1); > my $query = Bio::DB::Query::BioQuery->new(); > my $mapper = > Bio::DB::BioSQL::Oracle::BasePersistenceAdaptorDriver->new(); > > $query->datacollections( > [ > "Bio::SeqFeatureI f", > "Bio::SeqI=>Bio::SeqFeatureI s", > # "Bio::Ontology::TermI=>Bio::SeqFeatureI t", > "BioNamespace=>Bio::PrimarySeqI db", > ] > ); > > $query->where( [ "db.namespace = 'test'", "f.primary_tag = 'source'", > "s.accession_number = 'ABC'"] ); > > This did not work as I expected however. Seems like f.primary_tag is > mapped to type_trm_id, so I had to pass it the value 11 instead. Right. Source tag and primary tag are ontology terms. > So, I guessed "Bio::Ontology::TermI=>Bio::SeqFeatureI t" had to be > added to the datacollections in order to get to the value 'source' > with the where clause: > $query->where( [ "db.namespace = 'test'", "t.name = 'source'", > "s.accession_number = 'ABC'"] ); > This did not work neither seqfeature has two foreign keys to term, one for type ($feature->primary_tag) and one for source ($feature->source_tag). Without context, the OR mapper cannot know which one you're referring to in a join. The mechanism for supplying context is appending the contextual keyword to the alias of the ambiguous parent, delimited by '::'. For seqfeature's foreign keys to term, the context keywords are primary_tag and source_tag. I.e., you need to write ..., "Bio::Ontology::TermI=>Bio::SeqFeatureI t", ... as ..., "Bio::Ontology::TermI=>Bio::SeqFeatureI t::primary_tag", ... Let me know if it still produces the wrong SQL after making this change. -hilmar BTW the other instances when you need to supply context is for all relationships (seqfeature, bioentry, term). The keywords are subject and object for the subject and object of the relationship, respectively, and predicate for the predicate in a term relationship. No, this is not documented (yet; anyone willing to help?) ... > , because the generated sql was not OK: > SELECT f.oid, f.display_name, f.rank, f.ent_oid, f.type_trm_oid, > f.source_trm_oid FROM seqfeature f, bioentry s, term t, biodatabase db > WHERE f.ent_oid = s.oid AND f.trm_oid = t.oid AND s.db_oid = db.oid > AND (db.name = 'test' AND t.name = 'source' AND s.accession = > 'ABC')'']) > As far as I can see 'f.trm_oid = t.oid' should read 'f.type_trm_oid = > t.oid' ? > I am pretty new to the API and schema, so I guess it is more probable > that I am missing something here. > > Regards, > Marc > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Sun Nov 9 19:12:08 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Sun Nov 9 19:08:54 2003 Subject: [BioSQL-l] Re: SeqFeatureI::display_name In-Reply-To: Message-ID: <86084C29-1312-11D8-A309-000A959EB4C4@gnf.org> On Saturday, November 8, 2003, at 04:18 PM, Marc Logghe wrote: > The returned persistent feature objects seem not to know about their > parent; I mean, the display_name is undef. What should I change to the > query in order to fill that slot ? > Display_name is not populated from some property of the parent bioentry; it is a property of the seqfeature (see also Bio::SeqFeatureI::display_name). Bioperl itself doesn't use the display_name property when you create those features from databank files via the SeqIO path. Only Bio::Graphics/Bio::DB::GFF uses it I think. Maybe the GFF parser will populate it too, possibly only when reading GFF3? Does anybody know off hand? So, for all features that were loaded into the database using a SeqIO parser the display_name property will be undefined. Now, the other thing you're noticing is that the seqfeature adaptor will not automatically load the corresponding sequence and attach it to the feature. This is so that sequence object serialization does not enter a circular loop, because when storing sequences their features will be serialized too. Your way out is either to retrieve sequences from the query, not features, and then filter out those features you didn't want, or the code needs to be changed to allow that optionally the sequence is retrieved for each feature. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From billk at iinet.net.au Mon Nov 10 08:33:39 2003 From: billk at iinet.net.au (William Kenworthy) Date: Mon Nov 10 08:32:44 2003 Subject: [BioSQL-l] Bio::Seq and sequence version numbers Message-ID: <1068471219.5736.11.camel@rattus.localdomain> Hi I am using the following to retrieve a seq object $seq = Bio::Seq->new(-accession_number => "AL022723", -namespace => "bioperl"); This fails to retrieve the sequence because it defaults to a version of "0", and this sequence is at version 4. It works if I set it to "-version=>"4", but usually you dont know what version you are trying to retrieve. Is there a way of forcing a wildcard for version, or an alternate method that ignores it? Most, but not all sequences I have loaded are at version 1 in BioSQL after loading (my own script), and none are zero so I do not know if this is normal or not as the examples I have found ignore the version number, so they seem to default to zero. a confused BillK From Marc.Logghe at devgen.com Mon Nov 10 08:46:05 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Nov 10 08:42:51 2003 Subject: [BioSQL-l] query problem Message-ID: > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gnf.org] > Sent: Monday, November 10, 2003 1:01 AM > To: Marc Logghe > Cc: OBDA BioSQL (E-mail) > Subject: Re: [BioSQL-l] query problem > seqfeature has two foreign keys to term, one for type > ($feature->primary_tag) and one for source ($feature->source_tag). > Without context, the OR mapper cannot know which one you're referring > to in a join. The mechanism for supplying context is appending the > contextual keyword to the alias of the ambiguous parent, delimited by > '::'. For seqfeature's foreign keys to term, the context keywords are > primary_tag and source_tag. I.e., you need to write > > ..., > "Bio::Ontology::TermI=>Bio::SeqFeatureI t", > ... > > as > > ..., > "Bio::Ontology::TermI=>Bio::SeqFeatureI t::primary_tag", > ... > Pfuuu, I could't have found out that for myself. Thanks a lot. Unfortunatly in my case it did not work. The resulting SQL looked like: SELECT f.oid, f.display_name, f.rank, f.ent_oid, f.type_trm_oid, f.source_trm_oid FROM seqfeature f, bioentry s, term t, biodatabase db WHERE f.ent_oid = s.oid AND f. = t.oid AND s.db_oid = db.oid AND (db.name = 'test' AND t.name = 'source' AND s.accession = 'ABC') Note the "f. = t.oid" Tried this with a freshly installed bioperl-db from cvs (MAIN trunk) to rule out version problems. Something else. In another experiment I tried the following: $query->datacollections( [ "Bio::PrimarySeqI e", "Bio::Annotation::Comment cmt", "BioNamespace=>Bio::PrimarySeqI db", "Bio::PrimarySeqI=>Bio::Annotation::Comment" ] ); $query->where( [ "db.namespace = 'test'", "cmt.text like 'Current target*'" ] ); ------------- EXCEPTION ------------- MSG: slot 'name' not mapped to column for table biodatabase STACK Bio::DB::Query::BioQuery::_map_slot_to_col /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Query/BioQuery.pm:487 STACK Bio::DB::Query::BioQuery::_map_constraint_slots_to_columns /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Query/BioQuery.pm:369 STACK Bio::DB::Query::BioQuery::_map_constraint_slots_to_columns /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Query/BioQuery.pm:355 STACK Bio::DB::Query::BioQuery::translate_query /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Query/BioQuery.pm:305 STACK Bio::DB::BioSQL::BaseDriver::translate_query /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BaseDriver.pm:1182 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_query /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1198 STACK toplevel ./edit_bioentry.pl:56 -------------------------------------- I don't know whether this is due to a bug or abuse of the API from my side but, I managed to 'fix' it. In Bio::DB::BioSQL::BaseDriver I added the key/value pair "name" => "name" to the 'biodatabase' key of %slot_attribute_map. This fixed it, but a similar exception occurred after that (MSG: slot 'comment_text' not mapped to column for table anncomment). Everything was fine after adding comment_text => 'comment_text' pair. Thanks a lot for the help !!!! Regards, Marc From Marc.Logghe at devgen.com Mon Nov 10 09:39:10 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Nov 10 09:35:49 2003 Subject: [BioSQL-l] Bio::Seq and sequence version numbers Message-ID: Hi, > -----Original Message----- > From: William Kenworthy [mailto:billk@iinet.net.au] > Sent: Monday, November 10, 2003 2:34 PM > To: BioSQL List > Subject: [BioSQL-l] Bio::Seq and sequence version numbers > > > Hi I am using the following to retrieve a seq object > > $seq = Bio::Seq->new(-accession_number => "AL022723", > -namespace => "bioperl"); > > This fails to retrieve the sequence because it defaults to a > version of > "0", and this sequence is at version 4. It works if I set it to > "-version=>"4", but usually you dont know what version you > are trying to > retrieve. Is there a way of forcing a wildcard for version, or an > alternate method that ignores it? I think when you use the find_by_unique_key, you have to know the accession number AND version (after all you want to fetch a *unique* record; in case more versions are available, the system does not know which one you want). find_by_query might be a valid alternative. # set up the query template my $query = Bio::DB::Query::BioQuery->new( -datacollections => [ "Bio::PrimarySeqI e", "BioNamespace=>Bio::PrimarySeqI db" ], -where => [ "e.accession_number = ?", "db.namespace = 'bioperl'" ] , -order => ["e.version"] ); # perform the query with the desired value my $result = $adp->find_by_query($query, -name => 'test', -values => ['AL022723']); my $seq = $result->next_object; # gives you the first sequence object from the ordered list HTH, Marc From rbauer at informatik.hu-berlin.de Mon Nov 10 11:30:17 2003 From: rbauer at informatik.hu-berlin.de (Raphael Bauer) Date: Mon Nov 10 11:26:55 2003 Subject: [BioSQL-l] BioSQL conflict with swissprot and NCBI In-Reply-To: <200311062248.hA6Mlscm007395@portal.open-bio.org> Message-ID: taxonomy > i've got some trouble parsing the ncbi taxonomy into an existing biosql > schmema populated with swissport. ...cut out.... > > ...in my opinion it is due to the fact that swissprot has some kind of > taxonomy in it's OC lines that are a part of the NCBI taxonomy. (parsed > already in table term) > > So my question is if there is a way to integrate swissprot and ncbi in > one biosql schema. > Or if it is better to keep NCBI and swissprot seperated in own biosql > schemas and map them together lateron to get a mapping from ncbi and > swissprot... > > > So my question is if there is a way to integrate swissprot and ncbi in > > one biosql schema. > > Absolutely, but in the opposite order than you did. The problem with loading > swissprot first is that then you get about 6000-7000 taxa with unreliable > (against the NCBI taxonomy as the standard) and/or incomplete lineages. > > First load the NCBI taxonomy database, only then a sequence database. Which > BTW should also rid you of some errors you will have seen when you loaded > swissprot. > Hi Hilmar, thanks for the fast reply. I just tried it the other way round (First NCBI then Swissprot) but the problem still remains... ... I tried also parsing Swissprot with load_seqdatabase with --lookup and without -- lookup, but it makes no difference... (that's to some point clear for me as well).. ... My command lines and the error message: NCBI: ---- perl load_ncbi_taxonomy.pl --dbname NCBIdannSprot --driver Pg --host localhost --dbuser biosql --download --directory ~/wbi/. ...works fine Swissprot with lookup: ---------------------- perl load_seqdatabase.pl --lookup --host localhost --dbuser biosql --dbname NCBIdannSprot_mitlookup --namespace swissprot --driver Pg --format swiss /local/sprot_weekly.dat Loading /local/sprot_weekly.dat ... DBD::Pg::st execute failed: ERROR: Cannot insert a duplicate key into unique index taxon_pkey at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Pg/SpeciesAdaptorDriver.pm line 356, line 385883. Could not store O18759: ------------- EXCEPTION ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:243 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:170 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) load_seqdatabase.pl:446 STACK toplevel load_seqdatabase.pl:429 -------------------------------------- ...perhaps there is something wrong in my command line options, but i can't see it... Thanks for your help, Raphael Bauer From hlapp at gnf.org Mon Nov 10 12:38:02 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Nov 10 12:34:35 2003 Subject: [BioSQL-l] BioSQL conflict with swissprot and NCBI In-Reply-To: Message-ID: Some swissprot records still won't parse properly b/c of species parsing problems. Try to run the load_seqdatabase.pl with --safe (that's always a good idea anyway unless you want to immediately get thrown out upon the first trouble maker), then see what the accession numbers of those records are. A complete parse of swissprot and trembl should give you a count of failures that should be in the low double digits (out of a total of more than 1 million). -hilmar On Monday, November 10, 2003, at 08:30 AM, Raphael Bauer wrote: > taxonomy > >> i've got some trouble parsing the ncbi taxonomy into an existing >> biosql >> schmema populated with swissport. > ...cut out.... >> >> ...in my opinion it is due to the fact that swissprot has some kind of >> taxonomy in it's OC lines that are a part of the NCBI taxonomy. >> (parsed >> already in table term) >> >> So my question is if there is a way to integrate swissprot and ncbi in >> one biosql schema. >> Or if it is better to keep NCBI and swissprot seperated in own biosql >> schemas and map them together lateron to get a mapping from ncbi and >> swissprot... >> >>> So my question is if there is a way to integrate swissprot and ncbi >>> in >>> one biosql schema. >> >> Absolutely, but in the opposite order than you did. The problem with >> loading >> swissprot first is that then you get about 6000-7000 taxa with >> unreliable >> (against the NCBI taxonomy as the standard) and/or incomplete >> lineages. >> >> First load the NCBI taxonomy database, only then a sequence database. >> Which >> BTW should also rid you of some errors you will have seen when you >> loaded >> swissprot. >> > > Hi Hilmar, > thanks for the fast reply. > I just tried it the other way round (First NCBI > then Swissprot) but the problem still remains... > ... I tried also parsing Swissprot with load_seqdatabase with --lookup > and > without -- lookup, but it makes no difference... (that's to some point > clear for me as well).. > ... > My command lines and the error message: > NCBI: > ---- > perl load_ncbi_taxonomy.pl --dbname NCBIdannSprot --driver Pg --host > localhost --dbuser biosql --download --directory ~/wbi/. > ...works fine > > Swissprot with lookup: > ---------------------- > perl load_seqdatabase.pl --lookup --host localhost --dbuser biosql > --dbname NCBIdannSprot_mitlookup --namespace swissprot --driver Pg > --format swiss /local/sprot_weekly.dat > Loading /local/sprot_weekly.dat ... > DBD::Pg::st execute failed: ERROR: Cannot insert a duplicate key into > unique index taxon_pkey at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Pg/SpeciesAdaptorDriver.pm > line 356, line 385883. > Could not store O18759: > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Species) failed to insert or to be found by > unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::Persistent::PersistentObject::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:243 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:170 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) load_seqdatabase.pl:446 > STACK toplevel load_seqdatabase.pl:429 > > -------------------------------------- > > ...perhaps there is something wrong in my command line options, but i > can't see it... > > Thanks for your help, > > Raphael Bauer > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Nov 10 12:51:39 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Nov 10 12:48:11 2003 Subject: [BioSQL-l] Bio::Seq and sequence version numbers In-Reply-To: Message-ID: <8974EEDA-13A6-11D8-B5E5-000A959EB4C4@gnf.org> Cool - thanks for your help Marc. See below for a few additions. On Monday, November 10, 2003, at 06:39 AM, Marc Logghe wrote: > Hi, > >> -----Original Message----- >> From: William Kenworthy [mailto:billk@iinet.net.au] >> Sent: Monday, November 10, 2003 2:34 PM >> To: BioSQL List >> Subject: [BioSQL-l] Bio::Seq and sequence version numbers >> >> >> Hi I am using the following to retrieve a seq object >> >> $seq = Bio::Seq->new(-accession_number => "AL022723", >> -namespace => "bioperl"); >> >> This fails to retrieve the sequence because it defaults to a >> version of >> "0", and this sequence is at version 4. It works if I set it to >> "-version=>"4", but usually you dont know what version you >> are trying to >> retrieve. Is there a way of forcing a wildcard for version, or an >> alternate method that ignores it? > I think when you use the find_by_unique_key, you have to know the > accession number AND version (after all you want to fetch a *unique* > record; in case more versions are available, the system does not know > which one you want). > find_by_query might be a valid alternative. Right, that's the answer. > # set up the query template > my $query = Bio::DB::Query::BioQuery->new( > -datacollections => [ "Bio::PrimarySeqI e", > "BioNamespace=>Bio::PrimarySeqI db" ], BioNamespace may look like a type but it's not. It's a virtual object because bioperl does not have an equivalent class. > -where => [ "e.accession_number = ?", "db.namespace = > 'bioperl'" ] , > -order => ["e.version"] > ); > > # perform the query with the desired value > my $result = $adp->find_by_query($query, -name => 'test', -values => > ['AL022723']); Note that you don't have to name the query. If you do name it, it will become a prepared statement that the adaptor caches. > my $seq = $result->next_object; # gives you the first sequence object > from the ordered list > The class of the $seq object you can control in two ways. Either, specify the class when you obtain the adaptor: $adp = $db->get_persistence_adaptor("Bio::PrimarySeqI"); # or Bio::SeqI This will make you rely on the default object factory used by the adaptor you get returned. You can take full control by supplying the factory to find_by_query(): my $result = $adp->find_by_query($query, -name => 'test', -values => ['AL022723'], -obj_factory => Bio::Seq::SeqFactory->new(-type => "Bio::Seq::RichSeq")); Hth too, -hilmar > HTH, > Marc > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Marc.Logghe at devgen.com Mon Nov 10 15:34:06 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Nov 10 15:30:53 2003 Subject: [BioSQL-l] query problem Message-ID: > > seqfeature has two foreign keys to term, one for type > > ($feature->primary_tag) and one for source ($feature->source_tag). > > Without context, the OR mapper cannot know which one you're > referring > > to in a join. The mechanism for supplying context is appending the > > contextual keyword to the alias of the ambiguous parent, > delimited by > > '::'. For seqfeature's foreign keys to term, the context > keywords are > > primary_tag and source_tag. I.e., you need to write > > > > ..., > > "Bio::Ontology::TermI=>Bio::SeqFeatureI t", > > ... > > > > as > > > > ..., > > "Bio::Ontology::TermI=>Bio::SeqFeatureI t::primary_tag", > > ... > > > Pfuuu, I could't have found out that for myself. Thanks a lot. > Unfortunatly in my case it did not work. The resulting SQL > looked like: > SELECT f.oid, f.display_name, f.rank, f.ent_oid, > f.type_trm_oid, f.source_trm_oid FROM seqfeature f, bioentry > s, term t, biodatabase db WHERE f.ent_oid = s.oid AND f. = > t.oid AND s.db_oid = db.oid AND (db.name = 'test' AND t.name > = 'source' AND s.accession = 'ABC') > Note the "f. = t.oid" > Tried this with a freshly installed bioperl-db from cvs (MAIN > trunk) to rule out version problems. I could only make "Bio::Ontology::TermI=>Bio::SeqFeatureI t::primary_tag" work when the "primary_tag" => "type_trm_oid" key/val pair was added to the 'term' key in the %slot_attribute_map of Bio::DB::BioSQL::Oracle::BasePersistenceAdaptorDriver. It works, but I still have the feeling it does not really belong there, does it ? Cheers, Marc From hlapp at gnf.org Mon Nov 10 16:55:00 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Nov 10 16:52:23 2003 Subject: [BioSQL-l] query problem Message-ID: <833E32F61B9F8746878F2A1865BECE60530E3C@EXCHCLUSTER01.lj.gnf.org> > -----Original Message----- > From: Marc Logghe [mailto:Marc.Logghe@devgen.com] > Sent: Monday, November 10, 2003 12:34 PM > To: Hilmar Lapp > Cc: OBDA BioSQL (E-mail) > Subject: RE: [BioSQL-l] query problem > > > > I could only make "Bio::Ontology::TermI=>Bio::SeqFeatureI > t::primary_tag" work when the "primary_tag" => "type_trm_oid" > key/val pair was added to the 'term' key in the > %slot_attribute_map of > Bio::DB::BioSQL::Oracle::BasePersistenceAdaptorDriver. > It works, but I still have the feeling it does not really > belong there, does it ? Cheers, Marc > Hm. I was actually asking myself the same question (i.e., whether I need to add that there) when I answered your email. I thought I had this working at some point but maybe I didn't and my recollection just plays tricks on me. I'm not surprised that the contextual mapping needs to in the term table's slot map, since the same mechanism is used for the subject/object contextual mappings (they all are in the map of the table that is referenced by the foreign key, not in the table that has the FK). Great you figured this out on your own. Of course this then also needs to be added to the BaseDriver.pm to cover mysql and Pg. (The Oracle schema version has a different foreign key/primary key naming convention, which is why Oracle::BasePersistenceAdaptorDriver overrides a couple of those mappings.) -hilmar From Marc.Logghe at devgen.com Thu Nov 13 03:48:05 2003 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Nov 13 03:44:46 2003 Subject: [BioSQL-l] errors (?) with store() Message-ID: Hi, We have noticed something strange when edited persistence objects are stored into biosql. The flow looked like this. A multiple genbank file was loaded into biosql using the load_seqdatabase.pl script. Afterwards, bioeintries were fetched from the database (persistent seq objects) a new feature was added and the seq object was put back using store(). During that process we get loads of errors like these: DBD::Oracle::st execute failed: ORA-00001: unique constraint (BIOSQL.XPKBIOENTRY_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for statement ``INSERT INTO bioentry_qualifier_value (ent_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)'' with params: :p4=1, :p1='2136287', :p2='1909583', :p3=undef]) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 418, <> line 1. DBD::Oracle::st execute failed: ORA-00001: unique constraint (BIOSQL.XPKBIOENTRY_REF_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for statement ``INSERT INTO bioentry_reference (ent_oid, ref_oid, end_pos, start_pos, rank) VALUES (?, ?, ?, ?, ?)'' with params: :p4='1', :p5=1, :p1='2136287', :p2='1910107', :p3='161']) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 418, <> line 1. DBD::Oracle::st execute failed: ORA-00001: unique constraint (BIOSQL.XPKSEQFEATURE_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for statement ``INSERT INTO seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)'' with params: :p4=1, :p1='2136291', :p2=1910113, :p3='W09C5.6a']) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 418, <> line 1. DBD::Oracle::st execute failed: ORA-00001: unique constraint (BIOSQL.XPKSEQFEATURE_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for statement ``INSERT INTO seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)'' with params: :p4=2, :p1='2136291', :p2=1910113, :p3='W09C5.6b']) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 418, <> line 1. However, the resulting record, when dumped in genbank, looks fine. Not 100% sure though, but we have the impression these kind of errors do not occur when we work with entries which were added from an Bio::Seq::RichSeq object created from scratch, e.g. not via a genbank record. Does this make sense ? Regards, Marc *********************************************************** Marc Logghe, Ph.D. Senior Scientist Scientific Computing Group deVGen Technologiepark 9 9052 Zwijnaarde Belgium tel: +32 (0) 9 324 24 88 fax: +32 (0) 9 324 24 25 *********************************************************** From dnnm at novonordisk.com Thu Nov 13 03:54:40 2003 From: dnnm at novonordisk.com (DNNM (Dennis Madsen)) Date: Thu Nov 13 03:51:08 2003 Subject: [BioSQL-l] bioperl-db performance Message-ID: <36A25802686476479FBE08B99D1C111C1A0D03@exdkba023.novo.dk> Hi, I have seen other mails concerning a relatively low throughput of sequences during storage (with load_seqdatabase.pl). I have the same problem with the latest bioperl-db, bioperl 1.2.3, perl 5.8.1, RedHat 9 (with newly compiled perl to avoid the utf-8 problems in rh9). We have tested various RDBMS: MySQL 3.23.54a, MySQL 4.0.16 and Oracle 9.2.0.4 on different machines with 1-2 CPUs 2.5GHz P4, 1-2 Gb mem. and lots of disk space. But no matter what, the throughput is about 5 sequences per second. If I understand the benchmarks correct the expected throughput is about 60 seqs on a computer half as fast. If I start several jobs on separate machines to upload sequences to a common db (MySQL 4.0.16), the throughput scales perfectly so the RDBMS is not the bottleneck. I did the same test with biojava 1.3 + MySQL 3.23.54a (but with an older version of BioSQL that matches biojava) and there the throughput matches the benchmark (about 50 seqs per second). If I do some profiling of a 10 seq genbank file with perl -d:Dprof load_seqdatabase.pl ... The output from dprofpp tmon.out is: %Time ExclSec CumulS #Calls sec/call Csec/c Name 50.2 1.084 1.084 150388 0.0000 0.0000 overload::mycan 12.2 0.264 1.411 4208 0.0001 0.0003 Carp::caller_info 4.22 0.091 1.502 210 0.0004 0.0072 Carp::ret_backtrace 3.85 0.083 1.624 3600 0.0000 0.0005 Bio::DB::BioSQL::BasePersistenceAd aptor::_create_persistent 3.24 0.070 0.147 6174 0.0000 0.0000 Bio::DB::Persistent::PersistentObj ect::AUTOLOAD 3.10 0.067 1.134 5643 0.0000 0.0002 overload::StrVal 3.10 0.067 0.067 11722 0.0000 0.0000 UNIVERSAL::isa 2.41 0.052 0.086 1081 0.0000 0.0001 Bio::DB::Persistent::PersistentObj ect::can 2.22 0.048 0.661 151 0.0003 0.0044 Bio::Root::Root::_load_module 1.71 0.037 0.083 140 0.0003 0.0006 Bio::DB::BioSQL::SimpleValueAdapto r::add_association 1.67 0.036 0.036 1845 0.0000 0.0000 Bio::Root::RootI::_rearrange 1.25 0.027 1.609 3191 0.0000 0.0005 Bio::DB::BioSQL::BasePersistenceAd aptor::_process_child 1.20 0.026 1.772 1669 0.0000 0.0011 Bio::DB::BioSQL::BasePersistenceAd aptor::create_persistent 1.20 0.026 0.026 2118 0.0000 0.0000 UNIVERSAL::can 1.16 0.025 0.033 1422 0.0000 0.0000 Bio::Root::Root::new It seams like a lot of time is spent on creating objects. Is my system wrongly configured or am I doing something else wrong? Regards, Dennis ================================ Dennis Madsen, Ph.D. Scientific Computing, Bioinformatics Group Novo Nordisk Park, A2P 2760 M?l?v Denmark ================================ From Gerben.Menschaert at devgen.com Fri Nov 14 11:36:00 2003 From: Gerben.Menschaert at devgen.com (Gerben Menschaert) Date: Fri Nov 14 11:32:38 2003 Subject: [BioSQL-l] Problem in the SpeciesAdaptorDriver.pm Message-ID: Hello, We're running Biosql on Oracle (bioperl-db main cvs branch // bioperl-1.2.3). When loading a genbank file we're bumping into the following error: DBD::Oracle::db prepare failed: ORA-00918: column ambiguously defined (DBD ERROR: OCIStmtExecute/Describe) [for statement ``SELECT taxon_name.tax_oid, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.oid = taxon_name.tax_oid AND name_class = ? AND ncbi_taxon_id = ?'']) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/Oracle/SpeciesAdaptorDriver.pm line 224, line 49. Could not store AF438089: Can't call method "bind_param" on an undefined value at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 943, line 49. This is correct since the ncbi_taxon_id is both in the taxon and taxon_name table. To solve the problem I included the following code snippet (between comment lines) in Bio::DB::BioSQL::Oracle::SpeciesAdaptorDriver (see bottom) : 1) Anybody has the same problem? 2) The solution is not very nice, so I tried to look for the correct table_name in the %object_entity_map hash in the BaseDriver.pm ($self->table_name("Bio::Species"), commented out in the code below). It replaced the ncbi_taxon_id correctly to taxon.ncbi_taxon_id, but it also resulted in the following error further on in the code: (3) Why?) DBD::Oracle::st execute failed: ORA-00001: unique constraint (BIOSQL.XAK1TAXON) violated (DBD ERROR: OCIStmtExecute) [for statement ``INSERT INTO taxon (tax_oid, ncbi_taxon_id, node_rank, left_value, right_value) VALUES (?, ?, ?, ?, ?)'' with params: :p4=367534, :p5=367535, :p1='3405092', :p2='177435', :p3='species']) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/Oracle/SpeciesAdaptorDriver.pm line 378, line 49. Could not store AF438089: ------------- EXCEPTION ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:243 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:170 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) /usr/local/bin/load_seqdatabase.pl:446 STACK toplevel /usr/local/bin/load_seqdatabase.pl:429 -------------------------------------- ####Changes to SpeciesAdaptor.pm sub prepare_findbyuk_sth{ my ($self,$adp,$ukval_h,$fkslots) = @_; # get the slot/attribute map my $table = $self->table_name($adp); my $node_table = $self->table_name("TaxonNode"); my $pkname = $self->primary_key_name($node_table); my $fkname = $self->foreign_key_name("TaxonNode"); my $slotmap = $self->slot_attribute_map($table); # SELECT columns my @attrs = $self->_build_select_list($adp,$fkslots); # WHERE clause constraints my @cattrs = (); foreach (keys %$ukval_h) { my $col; if(exists($slotmap->{$_})) { $col = $slotmap->{$_}; } push(@cattrs, $col || "NULL"); $self->warn("slot $_ is in unique key, but can't be mapped to ". "an entity column: you won't find anything") unless $col; } ########################################################################## print STDERR "@cattrs \n"; for(my $i = 0; $i < @cattrs; $i++) { if($cattrs[$i] =~ /ncbi_taxon_id/i){ # my $name_table = $self->table_name("Bio::Species"); # $cattrs[$i] ="$name_table.ncbi_taxon_id"; $cattrs[$i] = "taxon.ncbi_taxon_id"; } } ######################################################################### # create the sql statement my $sql = "SELECT " . join(", ", @attrs) . " FROM $node_table, $table". " WHERE $node_table.$pkname = $table.$fkname AND ". join(" AND ", map { "$_ = ?"; } @cattrs); $adp->debug("preparing UK select statement: $sql\n"); # prepare statement and return return $adp->dbh()->prepare($sql); } Tanx, Gerben From hlapp at gnf.org Mon Nov 24 09:52:08 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Nov 24 09:58:32 2003 Subject: [BioSQL-l] Problem in the SpeciesAdaptorDriver.pm In-Reply-To: Message-ID: ncbi_taxon_id is only in the taxon table, *not* in the taxon_name table. I've seen someone posting this problem too a while ago. I suspected a version mix problem, which as far as I can recall turned out to be true, but my recollection may betray me. I'll check on this. It's possible I've messed up something in emulating the pure Biosql API. -hilmar On Friday, November 14, 2003, at 05:36 PM, Gerben Menschaert wrote: > Hello, > > We're running Biosql on Oracle (bioperl-db main cvs branch // > bioperl-1.2.3). > > When loading a genbank file we're bumping into the following error: > > DBD::Oracle::db prepare failed: ORA-00918: column ambiguously defined > (DBD ERROR: OCIStmtExecute/Describe) [for statement ``SELECT > taxon_name.tax_oid, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, > NULL FROM taxon, taxon_name WHERE taxon.oid = taxon_name.tax_oid AND > name_class = ? AND ncbi_taxon_id = ?'']) at > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/Oracle/ > SpeciesAdaptorDriver.pm line 224, line 49. > Could not store AF438089: Can't call method "bind_param" on an > undefined value at > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm > line 943, line 49. > > This is correct since the ncbi_taxon_id is both in the taxon and > taxon_name table. > > > To solve the problem I included the following code snippet (between > comment lines) in Bio::DB::BioSQL::Oracle::SpeciesAdaptorDriver (see > bottom) : > > 1) Anybody has the same problem? > 2) The solution is not very nice, so I tried to look for the correct > table_name in the %object_entity_map hash in the BaseDriver.pm > ($self->table_name("Bio::Species"), commented out in the code below). > It replaced the ncbi_taxon_id correctly to taxon.ncbi_taxon_id, but it > also resulted in the following error further on in the code: (3) Why?) > > DBD::Oracle::st execute failed: ORA-00001: unique constraint > (BIOSQL.XAK1TAXON) violated (DBD ERROR: OCIStmtExecute) [for statement > ``INSERT INTO taxon (tax_oid, ncbi_taxon_id, node_rank, left_value, > right_value) VALUES (?, ?, ?, ?, ?)'' with params: :p4=367534, > :p5=367535, :p1='3405092', :p2='177435', :p3='species']) at > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/Oracle/ > SpeciesAdaptorDriver.pm line 378, line 49. > Could not store AF438089: > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Species) failed to insert or to be found by > unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:207 > STACK Bio::DB::Persistent::PersistentObject::create > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/ > PersistentObject.pm:243 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:170 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/ > PersistentObject.pm:270 > STACK (eval) /usr/local/bin/load_seqdatabase.pl:446 > STACK toplevel /usr/local/bin/load_seqdatabase.pl:429 > > -------------------------------------- > > > > > ####Changes to SpeciesAdaptor.pm > > sub prepare_findbyuk_sth{ > my ($self,$adp,$ukval_h,$fkslots) = @_; > > # get the slot/attribute map > my $table = $self->table_name($adp); > my $node_table = $self->table_name("TaxonNode"); > my $pkname = $self->primary_key_name($node_table); > my $fkname = $self->foreign_key_name("TaxonNode"); > my $slotmap = $self->slot_attribute_map($table); > # SELECT columns > my @attrs = $self->_build_select_list($adp,$fkslots); > # WHERE clause constraints > my @cattrs = (); > foreach (keys %$ukval_h) { > my $col; > if(exists($slotmap->{$_})) { > $col = $slotmap->{$_}; > } > push(@cattrs, $col || "NULL"); > $self->warn("slot $_ is in unique key, but can't be mapped to > ". > "an entity column: you won't find anything") > unless $col; > } > ####################################################################### > ### > print STDERR "@cattrs \n"; > for(my $i = 0; $i < @cattrs; $i++) { > if($cattrs[$i] =~ /ncbi_taxon_id/i){ > # my $name_table = $self->table_name("Bio::Species"); > # $cattrs[$i] ="$name_table.ncbi_taxon_id"; > $cattrs[$i] = "taxon.ncbi_taxon_id"; > } > } > ####################################################################### > ## > # create the sql statement > my $sql = "SELECT " . join(", ", @attrs) . > " FROM $node_table, $table". > " WHERE $node_table.$pkname = $table.$fkname AND ". > join(" AND ", map { "$_ = ?"; } @cattrs); > $adp->debug("preparing UK select statement: $sql\n"); > # prepare statement and return > return $adp->dbh()->prepare($sql); > } > > > > Tanx, > Gerben > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Frederic.Pecqueur at devgen.com Mon Nov 24 11:05:02 2003 From: Frederic.Pecqueur at devgen.com (Frederic Pecqueur) Date: Mon Nov 24 11:11:36 2003 Subject: [BioSQL-l] how do you update the value of a tag? Message-ID: Hi, I have a problem when I want update the value of a tag in a Sequence of BioSQL. I use this code : my $adp = $db->get_object_adaptor('Bio::SeqI'); my $lseq = $adp->find_by_unique_key(Bio::Seq::RichSeq->new( -accession_number => $id, -namespace => $namespace )); my $source; foreach my $feat ( $lseq->get_SeqFeatures ) { if ( $feat->primary_tag eq 'gene' ) { $source = $feat; last; } } $source->remove_tag('standard_name'); $source->add_tag_value('standard_name','my new value'); $source->store(); Apparently the problem comes from the update of the "Seqfeature Qualifier Value" table. I obtain several errors about a bad insert command. I don't understand why the API uses an "INSERT" command and not an "UPDATE" command?? Here is the errors: DBD::Oracle::st execute failed: ORA-00001: unique constraint (BIOSQL.XPKSEQFEATURE_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for statement ``INSERT INTO seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)'' with params: :p4=1, :p1='5321562', :p2=1968, :p3='a new value']) at usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 418. somebody has an idea? Thanks a lot. Fr?d?ric. From hlapp at gnf.org Thu Nov 27 07:20:03 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Nov 27 07:26:36 2003 Subject: AW: [BioSQL-l] how do you update the value of a tag? Message-ID: <833E32F61B9F8746878F2A1865BECE60530E50@EXCHCLUSTER01.lj.gnf.org> Have you verified that your change of the tag does indeed not get updated in the database? There's a couple of things to note here. - most often one can ignore the statement execution errors, as they are usually caught and dealt with internally, unless there is a bug of course. But generally speaking, the only reason you see those messages is because I haven't silenced the DBD driver yet by default. (It can be silenced by changing a single line of code.) - for the case at hand, the code first attempts an insert instead of an update because the look-up key for the association consists of almost all columns of the association table, and those associations most often do not exist yet, hence an update would most often fail (note that those associations do not have a generated primary key, and have no resemblance in the object model) - if you want to update tags, the best thing to do is to delete the feature first, which will also remove the associations: [...] $source->remove(); # delete the feature in the database $source->remove_tag('standard_name'); # then modify to your liking [...] $source->store(); # finally re-insert; this is now the same as ...->create(); -hilmar -----Urspr?ngliche Nachricht----- Von: Frederic Pecqueur [mailto:Frederic.Pecqueur@devgen.com] Gesendet: Mo 24.11.2003 08:05 An: biosql-l@open-bio.org Cc: Betreff: [BioSQL-l] how do you update the value of a tag? Hi, I have a problem when I want update the value of a tag in a Sequence of BioSQL. I use this code : my $adp = $db->get_object_adaptor('Bio::SeqI'); my $lseq = $adp->find_by_unique_key(Bio::Seq::RichSeq->new( -accession_number => $id, -namespace => $namespace )); my $source; foreach my $feat ( $lseq->get_SeqFeatures ) { if ( $feat->primary_tag eq 'gene' ) { $source = $feat; last; } } $source->remove_tag('standard_name'); $source->add_tag_value('standard_name','my new value'); $source->store(); Apparently the problem comes from the update of the "Seqfeature Qualifier Value" table. I obtain several errors about a bad insert command. I don't understand why the API uses an "INSERT" command and not an "UPDATE" command?? Here is the errors: DBD::Oracle::st execute failed: ORA-00001: unique constraint (BIOSQL.XPKSEQFEATURE_QUALIFIER_ASSOC) violated (DBD ERROR: OCIStmtExecute) [for statement ``INSERT INTO seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) VALUES (?, ?, ?, ?)'' with params: :p4=1, :p1='5321562', :p2=1968, :p3='a new value']) at usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 418. somebody has an idea? Thanks a lot. Fr?d?ric. _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From juguang at tll.org.sg Fri Nov 28 02:41:24 2003 From: juguang at tll.org.sg (Juguang Xiao) Date: Fri Nov 28 02:47:37 2003 Subject: [BioSQL-l] script arguments revolution In-Reply-To: <1069860184.6574.51.camel@geordy.ebi.ac.uk> Message-ID: <449DE025-2176-11D8-B055-000A957702FE@tll.org.sg> Hello both users and authors of ensembl scripts, bioperl-db also. Have you even gotten bored to type '-host localhost -user root -dbname homo_sapiens_core_18_34' behind your own or others' ensembl script name, repetitively, 27 times a day? Is the below what you have to write for each scripts of your own, even the rest of the lines is much shorter than these? use Getopt::Long; my ($host, $user, $pass, $dbname, $others); &GetOptions( 'host|dbhost=s' => \$host, 'user|dbuser=s' => \$user, 'pass|dbpass=s' => \$pass, 'dbname=s' => \$dbname, 'others=s' => \$others ); use Bio::EnsEMBL::DBSQL::DBAdaptor; $db = Bio::EnsEMBL::DBSQL::DBAdaptor->new( -host => $host, -user => $host, -pass => $pass -dbname => $dbname ); Now, forget these wrist-painful past, and a new method is ready. ------------- For script users, or command-line typewriters whatever you called yourself, also for script authors, you can generate a so-called perl object file to store your most accessed db setting. Give the file a name, ensdb_homo_core_18.perlobj, with the content like this use strict; # The ceiling line use Bio:: EnsEMBL::DBSQL::DBAdaptor; my $db = Bio:: EnsEMBL::DBSQL::DBAdaptor->new( -host => 'ensembldb.ensembl.org', -user => 'anonymous', -dbname => 'homo_sapiens_core_18_34' ); $db; # The floor line Then, you can use Perl's do method, another mighty one, to execute the specified file and return the last line. $db = do('ensdb_homo_core_18.perlobj'); # Now you have a ensembl db, in one line!! I suggest the ensembl scripts accept --db_file option to get the perlobj file, to release users' figure ache. ---------------- For script authors, you are the supporters of this idea, I prepare a utility to facilitate you. Bio::EnsEMBL::Utils::EasyArgv::get_ens_db_from_argv is one-line solution for you to get db related arguments from @ARGV. Don't you think your script head will look nicer if like this? use Bio::EnsEMBL::Utils::EasyArgv; my $db = get_ens_db_from_argv; # this method is exported. use Getopt::Long; my ($others); &GetOptions( 'others=s' => \$others ); get_ens_db_from_argv will process the usual options, like db_file, host, dbhost, user, dbuser, pass, dbpass, dbname, etc, and removed them from @ARGV. If db_file is provided, the db is fetched from the perlobj file directly, otherwise, it will be generated in air based other other arguments. It also can lead to die unless the users feed the script sufficient information. Comments are welcome! Juguang