From florian.mittag at uni-tuebingen.de Thu Aug 6 05:43:56 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 11:43:56 +0200 Subject: [BioSQL-l] Error when loading Gene Ontology to biosql In-Reply-To: <52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu> References: <1E596269-ED8F-4ADF-9B54-A9A0CF908620@gmx.net> <52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu> Message-ID: <200908061143.56479.florian.mittag@uni-tuebingen.de> Hi! On Friday, 24. July 2009 02:39, Chris Fields wrote: > The warning is interesting, as it derives from our rollback of feature/ > annotation stuff in bioperl. It indicates the specified DBLink is > duplicated in the Bio::Ontology::Term. > > The exception makes sense in light of that (and seems to confirm the > link was already present). I'm getting the same warnings with my custom DB2 driver and with MySQL, but the script completes successfully. I get them when loading the Gene Ontology and the Sequence Ontology. -------------------- WARNING --------------------- MSG: GOC:mah exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:12297042 exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: GOC:mah exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: GOC:rph exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:12930826 exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:15012271 exists in the dblink of _default --------------------------------------------------- [...] Done with sequence. Done, cleaning up. What to do? - Florian > > On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote: > > Hi Carlos - that's an odd error that we haven't seen yet. My first > > impulse would be to suspect that your database wasn't empty when you > > ran this, and that the error you got is due to a term in the input > > file clashing with one you already have in the database. > > > > You can check this by looking into your database: > > > > SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name = > > 'invasive growth'; > > > > Does this return anything? > > > > Note that load_ontology.pl is perfectly equipped to update an > > existing ontology - check the POD and look for the --lookup command > > line option (and the several options following it in the POD with > > which you can modify the exact update behavior). By default though > > the script will assume that it is loading a new ontology. > > > > -hilmar > > > > On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote: > >> Hi Hilmar, > >> > >> thanks for the help. I've tried now this > >> > >> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass > >> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo > >> > >> downloaded from here > >> > >> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.ob > >>o > >> > >> and I have this error message. > >> > >> --------------------- WARNING --------------------- > >> MSG: DBLink _default > >> --------------------------------------------------- > >> Could not store term GO:0001404, name 'invasive growth': > >> > >> ------------- EXCEPTION: Bio::Root::Exception ------------- > >> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to > >> be found by unique key > >> STACK: Error::throw > >> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/ > >> Root.pm:357 > >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/ > >> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 > >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ > >> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 > >> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ > >> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 > >> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ > >> load_ontology.pl:812 > >> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 > >> ----------------------------------------------------------- > >> > >> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 > >> main::persist_term('-term', > >> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db', > >> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory', > >> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at / > >> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 > >> > >> Any hints to know where the problem would be? > >> > >> Thanks in advance, > >> > >> Carlos > >> > >> Carlos Canchaya > >> ccanchaya at gmail.com > >> > >> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote: > >>> Please leave off the --fmtargs GO.defs argument - this is not a > >>> file in the .obo format. > >>> > >>> -hilmar > >>> > >>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote: > >>>> Hi guys, > >>>> > >>>> I've tried to execute load_ontologies following your suggestions as > >>>> > >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy -- > >>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format > >>>> obo gene_ontology.1_2.obo > >>>> > >>>> However I have many warnings first > >>>> > >>>> --------------------- WARNING --------------------- > >>>> MSG: DBLink exists in the dblink of _default > >>>> --------------------------------------------------- > >>>> > >>>> and then > >>>> > >>>> --------------------- WARNING --------------------- > >>>> MSG: DBLink exists in the dblink of _default > >>>> --------------------------------------------------- > >>>> Could not store term GO:0001404, name 'invasive growth': > >>>> > >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or > >>>> to be found by unique key > >>>> STACK: Error::throw > >>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/ > >>>> bioperl-live//Bio/Root/Root.pm:357 > >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 > >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 > >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 > >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ > >>>> load_ontology.pl:812 > >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 > >>>> ----------------------------------------------------------- > >>>> > >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 > >>>> main::persist_term('-term', > >>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db', > >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory', > >>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at / > >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 > >>>> > >>>> > >>>> Any ideas why? > >>>> > >>>> Thanks in advance, > >>>> > >>>> Carlos > >>>> > >>>> > >>>> Carlos Canchaya > >>>> ccanchaya at gmail.com From hlapp at gmx.net Thu Aug 6 09:46:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:46:06 -0400 Subject: [BioSQL-l] Error when loading Gene Ontology to biosql In-Reply-To: <200908061143.56479.florian.mittag@uni-tuebingen.de> References: <1E596269-ED8F-4ADF-9B54-A9A0CF908620@gmx.net> <52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu> <200908061143.56479.florian.mittag@uni-tuebingen.de> Message-ID: The warnings are fine. They simply indicates that a dbxref is being added to the term that it already had. Part of the reason for that happening may be that Bioperl-db doesn't support different kinds of dbxrefs for terms yet, if I recall correctly, so once retrieved from the database they all end up in the _default category. -hilmar On Aug 6, 2009, at 5:43 AM, Florian Mittag wrote: > Hi! > > On Friday, 24. July 2009 02:39, Chris Fields wrote: >> The warning is interesting, as it derives from our rollback of >> feature/ >> annotation stuff in bioperl. It indicates the specified DBLink is >> duplicated in the Bio::Ontology::Term. >> >> The exception makes sense in light of that (and seems to confirm the >> link was already present). > > I'm getting the same warnings with my custom DB2 driver and with > MySQL, but > the script completes successfully. I get them when loading the Gene > Ontology > and the Sequence Ontology. > > -------------------- WARNING --------------------- > MSG: GOC:mah exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: PMID:12297042 exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: GOC:mah exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: GOC:rph exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: PMID:12930826 exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: PMID:15012271 exists in the dblink of _default > --------------------------------------------------- > > [...] > Done with sequence. > Done, cleaning up. > > > What to do? > > - Florian > >> >> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote: >>> Hi Carlos - that's an odd error that we haven't seen yet. My first >>> impulse would be to suspect that your database wasn't empty when you >>> ran this, and that the error you got is due to a term in the input >>> file clashing with one you already have in the database. >>> >>> You can check this by looking into your database: >>> >>> SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name = >>> 'invasive growth'; >>> >>> Does this return anything? >>> >>> Note that load_ontology.pl is perfectly equipped to update an >>> existing ontology - check the POD and look for the --lookup command >>> line option (and the several options following it in the POD with >>> which you can modify the exact update behavior). By default though >>> the script will assume that it is loading a new ontology. >>> >>> -hilmar >>> >>> On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote: >>>> Hi Hilmar, >>>> >>>> thanks for the help. I've tried now this >>>> >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass >>>> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo >>>> >>>> downloaded from here >>>> >>>> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.ob >>>> o >>>> >>>> and I have this error message. >>>> >>>> --------------------- WARNING --------------------- >>>> MSG: DBLink _default >>>> --------------------------------------------------- >>>> Could not store term GO:0001404, name 'invasive growth': >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to >>>> be found by unique key >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/ >>>> Root/ >>>> Root.pm:357 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/ >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ >>>> load_ontology.pl:812 >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 >>>> ----------------------------------------------------------- >>>> >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 >>>> main::persist_term('-term', >>>> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db', >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory', >>>> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at / >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 >>>> >>>> Any hints to know where the problem would be? >>>> >>>> Thanks in advance, >>>> >>>> Carlos >>>> >>>> Carlos Canchaya >>>> ccanchaya at gmail.com >>>> >>>> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote: >>>>> Please leave off the --fmtargs GO.defs argument - this is not a >>>>> file in the .obo format. >>>>> >>>>> -hilmar >>>>> >>>>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote: >>>>>> Hi guys, >>>>>> >>>>>> I've tried to execute load_ontologies following your >>>>>> suggestions as >>>>>> >>>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy -- >>>>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format >>>>>> obo gene_ontology.1_2.obo >>>>>> >>>>>> However I have many warnings first >>>>>> >>>>>> --------------------- WARNING --------------------- >>>>>> MSG: DBLink exists in the dblink of _default >>>>>> --------------------------------------------------- >>>>>> >>>>>> and then >>>>>> >>>>>> --------------------- WARNING --------------------- >>>>>> MSG: DBLink exists in the dblink of _default >>>>>> --------------------------------------------------- >>>>>> Could not store term GO:0001404, name 'invasive growth': >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or >>>>>> to be found by unique key >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/ >>>>>> bioperl-live//Bio/Root/Root.pm:357 >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/ >>>>>> local/ >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 >>>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ >>>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 >>>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ >>>>>> load_ontology.pl:812 >>>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 >>>>>> main::persist_term('-term', >>>>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db', >>>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory', >>>>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at / >>>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 >>>>>> >>>>>> >>>>>> Any ideas why? >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Carlos >>>>>> >>>>>> >>>>>> Carlos Canchaya >>>>>> ccanchaya at gmail.com -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florian.mittag at uni-tuebingen.de Thu Aug 6 10:20:31 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 16:20:31 +0200 Subject: [BioSQL-l] Error when loading Gene Ontology to biosql In-Reply-To: References: <200908061143.56479.florian.mittag@uni-tuebingen.de> Message-ID: <200908061620.31766.florian.mittag@uni-tuebingen.de> Ok, that's a relieve. Thanks for the quick answer! - Florian On Thursday, 6. August 2009 15:46, Hilmar Lapp wrote: > The warnings are fine. They simply indicates that a dbxref is being > added to the term that it already had. > > Part of the reason for that happening may be that Bioperl-db doesn't > support different kinds of dbxrefs for terms yet, if I recall > correctly, so once retrieved from the database they all end up in the > _default category. > > -hilmar > > On Aug 6, 2009, at 5:43 AM, Florian Mittag wrote: > > Hi! > > > > On Friday, 24. July 2009 02:39, Chris Fields wrote: > >> The warning is interesting, as it derives from our rollback of > >> feature/ > >> annotation stuff in bioperl. It indicates the specified DBLink is > >> duplicated in the Bio::Ontology::Term. > >> > >> The exception makes sense in light of that (and seems to confirm the > >> link was already present). > > > > I'm getting the same warnings with my custom DB2 driver and with > > MySQL, but > > the script completes successfully. I get them when loading the Gene > > Ontology > > and the Sequence Ontology. > > > > -------------------- WARNING --------------------- > > MSG: GOC:mah exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: PMID:12297042 exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: GOC:mah exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: GOC:rph exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: PMID:12930826 exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: PMID:15012271 exists in the dblink of _default > > --------------------------------------------------- > > > > [...] > > Done with sequence. > > Done, cleaning up. > > > > > > What to do? > > > > - Florian > > > >> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote: > >>> Hi Carlos - that's an odd error that we haven't seen yet. My first > >>> impulse would be to suspect that your database wasn't empty when you > >>> ran this, and that the error you got is due to a term in the input > >>> file clashing with one you already have in the database. > >>> > >>> You can check this by looking into your database: > >>> > >>> SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name = > >>> 'invasive growth'; > >>> > >>> Does this return anything? > >>> > >>> Note that load_ontology.pl is perfectly equipped to update an > >>> existing ontology - check the POD and look for the --lookup command > >>> line option (and the several options following it in the POD with > >>> which you can modify the exact update behavior). By default though > >>> the script will assume that it is loading a new ontology. > >>> > >>> -hilmar > >>> > >>> On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote: > >>>> Hi Hilmar, > >>>> > >>>> thanks for the help. I've tried now this > >>>> > >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass > >>>> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo > >>>> > >>>> downloaded from here > >>>> > >>>> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2. > >>>>ob o > >>>> > >>>> and I have this error message. > >>>> > >>>> --------------------- WARNING --------------------- > >>>> MSG: DBLink _default > >>>> --------------------------------------------------- > >>>> Could not store term GO:0001404, name 'invasive growth': > >>>> > >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to > >>>> be found by unique key > >>>> STACK: Error::throw > >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/ > >>>> Root/ > >>>> Root.pm:357 > >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 > >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 > >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 > >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ > >>>> load_ontology.pl:812 > >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 > >>>> ----------------------------------------------------------- > >>>> > >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 > >>>> main::persist_term('-term', > >>>> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db', > >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory', > >>>> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at / > >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 > >>>> > >>>> Any hints to know where the problem would be? > >>>> > >>>> Thanks in advance, > >>>> > >>>> Carlos > >>>> > >>>> Carlos Canchaya > >>>> ccanchaya at gmail.com > >>>> > >>>> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote: > >>>>> Please leave off the --fmtargs GO.defs argument - this is not a > >>>>> file in the .obo format. > >>>>> > >>>>> -hilmar > >>>>> > >>>>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote: > >>>>>> Hi guys, > >>>>>> > >>>>>> I've tried to execute load_ontologies following your > >>>>>> suggestions as > >>>>>> > >>>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy -- > >>>>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format > >>>>>> obo gene_ontology.1_2.obo > >>>>>> > >>>>>> However I have many warnings first > >>>>>> > >>>>>> --------------------- WARNING --------------------- > >>>>>> MSG: DBLink exists in the dblink of _default > >>>>>> --------------------------------------------------- > >>>>>> > >>>>>> and then > >>>>>> > >>>>>> --------------------- WARNING --------------------- > >>>>>> MSG: DBLink exists in the dblink of _default > >>>>>> --------------------------------------------------- > >>>>>> Could not store term GO:0001404, name 'invasive growth': > >>>>>> > >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or > >>>>>> to be found by unique key > >>>>>> STACK: Error::throw > >>>>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/ > >>>>>> bioperl-live//Bio/Root/Root.pm:357 > >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/ > >>>>>> local/ > >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 > >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ > >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 > >>>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ > >>>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 > >>>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ > >>>>>> load_ontology.pl:812 > >>>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 > >>>>>> ----------------------------------------------------------- > >>>>>> > >>>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 > >>>>>> main::persist_term('-term', > >>>>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db', > >>>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory', > >>>>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at / > >>>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 > >>>>>> > >>>>>> > >>>>>> Any ideas why? > >>>>>> > >>>>>> Thanks in advance, > >>>>>> > >>>>>> Carlos > >>>>>> > >>>>>> > >>>>>> Carlos Canchaya > >>>>>> ccanchaya at gmail.com -- Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076 Tuebingen, Germany Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 From haili at mpiz-koeln.mpg.de Mon Aug 10 10:21:39 2009 From: haili at mpiz-koeln.mpg.de (Song Haili) Date: Mon, 10 Aug 2009 16:21:39 +0200 Subject: [BioSQL-l] how to load other data to biosql database? Message-ID: Dear all, Does any of you know how to load other data, such as domain, EC number, Mapman bins, Interaction , Kegg Ontology etc, into biosql database? Is it possible by using load_ontology.pl? If it is, what are the corresponding arguments? Otherwise, should I write my own scripts? Any suggestion will be highly appreciated! Best regards, song From florian.mittag at uni-tuebingen.de Tue Aug 11 04:10:12 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Tue, 11 Aug 2009 10:10:12 +0200 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? Message-ID: <200908111010.12143.florian.mittag@uni-tuebingen.de> Hi! I stumbled upon an old post from Hilmar: On Tue, 18 Mar 2003, Hilmar Lapp wrote: > type_term_id is supposed to reference an SO term. source is supposed to > denote the 'method' (BLAST, BLAT, sim4, genewise, whatnot), as far as > my understanding goes. In the case of reading the features from a > GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the source > (which is what the genbank, embl, and swissprot parsers do in bioperl) > is maybe stretching the definition, but I don't have something > substantially better to offer. I inspected the database after I imported some Genbank files with BioJava, and I found that the source_term_id for the seqfeatures is always set to the ID of an automatically inserted term "Genbank" with definition "auto-generated by biojavax". I was wondering if there is anything new to the source_term_id. - Florian From florian.mittag at uni-tuebingen.de Tue Aug 11 05:09:50 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Tue, 11 Aug 2009 11:09:50 +0200 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> Message-ID: <200908111109.50361.florian.mittag@uni-tuebingen.de> Hm, I should've mentioned my real concern. We're integrating all kinds of data into the database and right now I want to import miRNA information (sequences and target sites) from miRBase (http://microrna.sanger.ac.uk/sequences/). The files I download from there specify "miRanda" as METHOD, so should I use this as source term or miRBase? Thanks, - Florian On Tuesday, 11. August 2009 10:59, Richard Holland wrote: > The reason BJX does that is because the Genbank format has no > indication of where a feature came from. So, all there is to go on is > that it came from Genbank! This allows us to differentiate between > features on a sequence that were loaded from an original file, and new > features that have been added to the sequence in the db after it was > loaded (e.g. by running blast, blat etc. against some local data). > > On 11 Aug 2009, at 09:10, Florian Mittag wrote: > > Hi! > > > > I stumbled upon an old post from Hilmar: > > > > On Tue, 18 Mar 2003, Hilmar Lapp wrote: > >> type_term_id is supposed to reference an SO term. source is > >> supposed to > >> denote the 'method' (BLAST, BLAT, sim4, genewise, whatnot), as far > >> as > >> my understanding goes. In the case of reading the features from a > >> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the > >> source > >> (which is what the genbank, embl, and swissprot parsers do in > >> bioperl) > >> is maybe stretching the definition, but I don't have something > >> substantially better to offer. > > > > I inspected the database after I imported some Genbank files with > > BioJava, and > > I found that the source_term_id for the seqfeatures is always set to > > the ID > > of an automatically inserted term "Genbank" with definition "auto- > > generated > > by biojavax". > > > > I was wondering if there is anything new to the source_term_id. > > > > - Florian > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ -- Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076 Tuebingen, Germany Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 From holland at eaglegenomics.com Tue Aug 11 05:22:41 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 11 Aug 2009 10:22:41 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <200908111109.50361.florian.mittag@uni-tuebingen.de> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> Message-ID: <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> Ideally there would be two fields for source_term_id - one for the algorithm used to generate the data (e.g. BLAST, miRanda), the other for the source the data came from (e.g. Genbank, miRBase). These are two very distinct concepts and it is not easy to represent them successfully using a single ontology source_term_id field. So the only way round it if you need to represent both algorithm and source is to create your own ontology which is a cross-product of the two possible sets of values (triples would be good for this). If you want to use only a single term, basically it's up to you whether you choose to annotate by algorithm (miRanda) or by source (miRBase). I expect the decision will rest on whether it is more important for you to know which features in your database were added locally and which came from a remote source, or if knowing the algorithm used to generate them is more important. Otherwise if both are important the cross-product triple approach is probably the only way to go. cheers, Richard On 11 Aug 2009, at 10:09, Florian Mittag wrote: > Hm, I should've mentioned my real concern. We're integrating all > kinds of data > into the database and right now I want to import miRNA information > (sequences > and target sites) from miRBase (http://microrna.sanger.ac.uk/sequences/ > ). > The files I download from there specify "miRanda" as METHOD, so > should I use > this as source term or miRBase? > > Thanks, > - Florian > > On Tuesday, 11. August 2009 10:59, Richard Holland wrote: >> The reason BJX does that is because the Genbank format has no >> indication of where a feature came from. So, all there is to go on is >> that it came from Genbank! This allows us to differentiate between >> features on a sequence that were loaded from an original file, and >> new >> features that have been added to the sequence in the db after it was >> loaded (e.g. by running blast, blat etc. against some local data). >> >> On 11 Aug 2009, at 09:10, Florian Mittag wrote: >>> Hi! >>> >>> I stumbled upon an old post from Hilmar: >>> >>> On Tue, 18 Mar 2003, Hilmar Lapp wrote: >>>> type_term_id is supposed to reference an SO term. source is >>>> supposed to >>>> denote the 'method' (BLAST, BLAT, sim4, genewise, whatnot), as far >>>> as >>>> my understanding goes. In the case of reading the features from a >>>> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the >>>> source >>>> (which is what the genbank, embl, and swissprot parsers do in >>>> bioperl) >>>> is maybe stretching the definition, but I don't have something >>>> substantially better to offer. >>> >>> I inspected the database after I imported some Genbank files with >>> BioJava, and >>> I found that the source_term_id for the seqfeatures is always set to >>> the ID >>> of an automatically inserted term "Genbank" with definition "auto- >>> generated >>> by biojavax". >>> >>> I was wondering if there is anything new to the source_term_id. >>> >>> - Florian >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ > > -- > Dipl. Inf. Florian Mittag > Universit?t Tuebingen > WSI-RA, Sand 1 > 72076 Tuebingen, Germany > Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Tue Aug 11 04:59:27 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 11 Aug 2009 09:59:27 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <200908111010.12143.florian.mittag@uni-tuebingen.de> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> Message-ID: <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> The reason BJX does that is because the Genbank format has no indication of where a feature came from. So, all there is to go on is that it came from Genbank! This allows us to differentiate between features on a sequence that were loaded from an original file, and new features that have been added to the sequence in the db after it was loaded (e.g. by running blast, blat etc. against some local data). On 11 Aug 2009, at 09:10, Florian Mittag wrote: > Hi! > > I stumbled upon an old post from Hilmar: > > On Tue, 18 Mar 2003, Hilmar Lapp wrote: >> type_term_id is supposed to reference an SO term. source is >> supposed to >> denote the 'method' (BLAST, BLAT, sim4, genewise, whatnot), as far >> as >> my understanding goes. In the case of reading the features from a >> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the >> source >> (which is what the genbank, embl, and swissprot parsers do in >> bioperl) >> is maybe stretching the definition, but I don't have something >> substantially better to offer. > > I inspected the database after I imported some Genbank files with > BioJava, and > I found that the source_term_id for the seqfeatures is always set to > the ID > of an automatically inserted term "Genbank" with definition "auto- > generated > by biojavax". > > I was wondering if there is anything new to the source_term_id. > > - Florian > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Fri Aug 14 18:56:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 14 Aug 2009 18:56:11 -0400 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> Message-ID: <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> On Aug 11, 2009, at 5:22 AM, Richard Holland wrote: > Ideally there would be two fields for source_term_id - one for the > algorithm used to generate the data (e.g. BLAST, miRanda), the other > for the source the data came from (e.g. Genbank, miRBase). You mean the source of the data that it was applied to. I agree though that if you want both you can create a cross-product term and store the decomposition as term_relationship's. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at eaglegenomics.com Sat Aug 15 06:44:16 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 15 Aug 2009 11:44:16 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> Message-ID: <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> On 14 Aug 2009, at 23:56, Hilmar Lapp wrote: > > On Aug 11, 2009, at 5:22 AM, Richard Holland wrote: > >> Ideally there would be two fields for source_term_id - one for the >> algorithm used to generate the data (e.g. BLAST, miRanda), the >> other for the source the data came from (e.g. Genbank, miRBase). > > > You mean the source of the data that it was applied to. Not necessarily. The source of the data that it was applied to (ie. the sequence the feature refers to) is a third thing - and that is an attribute of the sequence the feature refers to, rather than the feature itself. What I mean is this: 1. The sequence itself could be downloaded from Genbank, EMBL, or elsewhere, or I could have discovered it in-house. 2. The features on the sequence could have been generated by running BLAST, miRBase, etc., or they could be manually annotated. 3. The features on the sequence could have been downloaded from Genbank, EMBL, etc., or they could have been made locally, or by a collaborator at another institute. To my mind these are three distinct things. (1) is sequence-related, and (2) and (3) are feature-related. cheers, Richard > I agree though that if you want both you can create a cross-product > term and store the decomposition as term_relationship's. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Sat Aug 15 10:29:24 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 10:29:24 -0400 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> Message-ID: <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> On Aug 15, 2009, at 6:44 AM, Richard Holland wrote: > [...] > What I mean is this: > > 1. The sequence itself could be downloaded from Genbank, EMBL, or > elsewhere, or I could have discovered it in-house. That's actually what I meant. > 2. The features on the sequence could have been generated by > running BLAST, miRBase, etc., or they could be manually annotated. > 3. The features on the sequence could have been downloaded from > Genbank, EMBL, etc., or they could have been made locally, or by a > collaborator at another institute. Right, but if a feature is the result of you running some algorithm against some sequences, then it's not been downloaded or given to you. Features on one and the same sequence can have different sources, obviously, so I'm a bit confused - I think we're talking about the same thing in different words, but I'm not sure. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at eaglegenomics.com Sat Aug 15 12:32:35 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 15 Aug 2009 17:32:35 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> Message-ID: <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> On 15 Aug 2009, at 15:29, Hilmar Lapp wrote: > > On Aug 15, 2009, at 6:44 AM, Richard Holland wrote: > >> [...] >> What I mean is this: >> >> 1. The sequence itself could be downloaded from Genbank, EMBL, or >> elsewhere, or I could have discovered it in-house. > > That's actually what I meant. > >> 2. The features on the sequence could have been generated by >> running BLAST, miRBase, etc., or they could be manually annotated. >> 3. The features on the sequence could have been downloaded from >> Genbank, EMBL, etc., or they could have been made locally, or by a >> collaborator at another institute. > > Right, but if a feature is the result of you running some algorithm > against some sequences, then it's not been downloaded or given to > you. Features on one and the same sequence can have different > sources, obviously, so I'm a bit confused - I think we're talking > about the same thing in different words, but I'm not sure. Probably. :) Case study: I download some seqs from Genbank. (Which then need to be annotated as having come from Genbank, at the sequence level). They already have some features on them (which need to be annotated as having come from Genbank, at the feature level, but of an unknown algorithm as Genbank doesn't specify how they were generated usually). I then run BLAST of those sequences against some local data, and record my own features as a result. I also run BLAT, and again record my own features. My colleague also runs BLAST of the same seqs against some data of his own, and wants our combined feature results to be stored in the same database. I want to be able to annotate all these new features both with the algorithm used to generate them (BLAST or BLAT) and who did it (myself or my colleague at the institute down the road), in addition to retaining the original features that came from Genbank (and making sure they're annotated as such). Hence I'd need a source attribute for the sequence (Genbank in this case), a source attribute for each feature (Genbank, Me, or Colleague X, in this case), and an algorithm/technique/protocol attribute for each feature (BLAST or BLAT or 'don't know it just came from Genbank' in this example). cheers, Richard > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Sat Aug 15 15:31:13 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 15:31:13 -0400 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> Message-ID: <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net> On Aug 15, 2009, at 12:32 PM, Richard Holland wrote: > [...] > Case study: Great, now we're getting somewhere :-) > I download some seqs from Genbank. (Which then need to be annotated > as having come from Genbank, at the sequence level). Note, as you say, *at the sequence level*. I.e., you would record this either using the bioentry's namespace (biodatabase), or a bioentry_qualifier_value annotation. I would choose the former, though since a bioentry can on only be in one namespace, it may not satisfy your needs. > They already have some features on them (which need to be annotated > as having come from Genbank, at the feature level, but of an unknown > algorithm as Genbank doesn't specify how they were generated usually). Right. The source term would indicate that GenBank provided them to you, and that that's all you know. > I then run BLAST of those sequences against some local data, and > record my own features as a result. I also run BLAT, and again > record my own features. BLAST and BLAT would now be the source terms. > My colleague also runs BLAST of the same seqs against some data of > his own, and wants our combined feature results to be stored in the > same database. I want to be able to annotate all these new features > both with the algorithm used to generate them (BLAST or BLAT) You use the source term for that. > and who did it (myself or my colleague at the institute down the road) Ah - that's provenance information, not the source as is normally referred to. BioSQL at present doesn't have an explicit provenance model, but you can still record provenance information through ontology-typed tag/value annotation in seqfeature_qualifier_value, with the terms coming from a provenance ontology (that you make up yourself or grab from somewhere else). > , in addition to retaining the original features that came from > Genbank (and making sure they're annotated as such). That shouldn't be a problem - certainly it's not for BioSQL. > Hence I'd need a source attribute for the sequence (Genbank in this > case), a source attribute for each feature (Genbank, Me, or > Colleague X, in this case), and an algorithm/technique/protocol > attribute for each feature (BLAST or BLAT or 'don't know it just > came from Genbank' in this example). Not quite - source really is what provided the feature to you, not who or when, or using which BLAST database, genome assembly, or how you parsed the results, etc etc. That's all provenance information. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at eaglegenomics.com Sat Aug 15 16:00:39 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 15 Aug 2009 21:00:39 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net> Message-ID: <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com> Ok, cool. So we can now rephrase the original question to...: How should provenance information be stored in BioSQL? :) cheers, Richard On 15 Aug 2009, at 20:31, Hilmar Lapp wrote: > > On Aug 15, 2009, at 12:32 PM, Richard Holland wrote: > >> [...] >> Case study: > > Great, now we're getting somewhere :-) > >> I download some seqs from Genbank. (Which then need to be annotated >> as having come from Genbank, at the sequence level). > > Note, as you say, *at the sequence level*. I.e., you would record > this either using the bioentry's namespace (biodatabase), or a > bioentry_qualifier_value annotation. I would choose the former, > though since a bioentry can on only be in one namespace, it may not > satisfy your needs. > >> They already have some features on them (which need to be annotated >> as having come from Genbank, at the feature level, but of an >> unknown algorithm as Genbank doesn't specify how they were >> generated usually). > > Right. The source term would indicate that GenBank provided them to > you, and that that's all you know. > >> I then run BLAST of those sequences against some local data, and >> record my own features as a result. I also run BLAT, and again >> record my own features. > > BLAST and BLAT would now be the source terms. > >> My colleague also runs BLAST of the same seqs against some data of >> his own, and wants our combined feature results to be stored in the >> same database. I want to be able to annotate all these new features >> both with the algorithm used to generate them (BLAST or BLAT) > > You use the source term for that. > >> and who did it (myself or my colleague at the institute down the >> road) > > Ah - that's provenance information, not the source as is normally > referred to. BioSQL at present doesn't have an explicit provenance > model, but you can still record provenance information through > ontology-typed tag/value annotation in seqfeature_qualifier_value, > with the terms coming from a provenance ontology (that you make up > yourself or grab from somewhere else). > >> , in addition to retaining the original features that came from >> Genbank (and making sure they're annotated as such). > > That shouldn't be a problem - certainly it's not for BioSQL. > >> Hence I'd need a source attribute for the sequence (Genbank in this >> case), a source attribute for each feature (Genbank, Me, or >> Colleague X, in this case), and an algorithm/technique/protocol >> attribute for each feature (BLAST or BLAT or 'don't know it just >> came from Genbank' in this example). > > Not quite - source really is what provided the feature to you, not > who or when, or using which BLAST database, genome assembly, or how > you parsed the results, etc etc. That's all provenance information. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Sat Aug 15 16:14:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:14:54 -0400 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net> <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com> Message-ID: <92DD5E74-5638-4CB8-B34A-3282AACF036A@gmx.net> On Aug 15, 2009, at 4:00 PM, Richard Holland wrote: > Ok, cool. So we can now rephrase the original question to...: How > should provenance information be stored in BioSQL? Yes, and the answer is using a provenance ontology or controlled vocabulary and bioentry_qualifier_value and seqfeature_qualifier_value. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Wed Aug 26 06:53:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Aug 2009 11:53:40 +0100 Subject: [BioSQL-l] Indexing of (seqfeature) locations? Message-ID: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> Hi BioSQL folks, The BioSQL schema includes a few indexes on the location table (e.g. quoting the MySQL schema, but it looks the same on pg too): CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); CREATE INDEX seqfeatureloc_dbx ON location(dbxref_id); CREATE INDEX seqfeatureloc_trm ON location(term_id); Will these facilitate searches like this?: "SELECT ... WHERE 2000 <= location.start_pos AND location.end_pos <= 5000 AND ..." Or, for this would it help to include: CREATE INDEX seqfeatureloc_start ON location(start_pos); CREATE INDEX seqfeatureloc_start ON location(end_pos); A motivational use case would be to pull out an operon, or a region of a record as part of a genome browser. Thanks, Peter From hlapp at gmx.net Wed Aug 26 08:07:08 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 Aug 2009 08:07:08 -0400 Subject: [BioSQL-l] Indexing of (seqfeature) locations? In-Reply-To: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> Message-ID: On Aug 26, 2009, at 6:53 AM, Peter wrote: > The BioSQL schema includes a few indexes on the location table > (e.g. quoting the MySQL schema, but it looks the same on pg too): > > CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); > [...] > Will these facilitate searches like this?: > > "SELECT ... WHERE 2000 <= location.start_pos > AND location.end_pos <= 5000 AND ..." > > Or, for this would it help to include: > > CREATE INDEX seqfeatureloc_start ON location(start_pos); > CREATE INDEX seqfeatureloc_start ON location(end_pos); With a decent RDBMS, having two indexes instead of a compound one will slow this query down. What the compound one won't help you with is if your query doesn't constrain the leading columns. For example, a compound index on (start_pos,end_pos) won't be used if you only constrain end_pos. If you want to do that, you need on index on (end_pos) too. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Wed Aug 26 08:29:56 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Aug 2009 13:29:56 +0100 Subject: [BioSQL-l] Indexing of (seqfeature) locations? In-Reply-To: References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> Message-ID: <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com> On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp wrote: > > > On Aug 26, 2009, at 6:53 AM, Peter wrote: > >> The BioSQL schema includes a few indexes on the location table >> (e.g. quoting the MySQL schema, but it looks the same on pg too): >> >> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); >> [...] >> Will these facilitate searches like this?: >> >> "SELECT ... WHERE 2000 <= location.start_pos >> AND location.end_pos <= 5000 AND ..." >> >> Or, for this would it help to include: >> >> CREATE INDEX seqfeatureloc_start ON location(start_pos); >> CREATE INDEX seqfeatureloc_start ON location(end_pos); > > With a decent RDBMS, having two indexes instead of a compound one will slow > this query down. What the compound one won't help you with is if your query > doesn't constrain the leading columns. For example, a compound index on > (start_pos,end_pos) won't be used if you only constrain end_pos. If you want > to do that, you need on index on (end_pos) too. Thanks for your reply Hilmar. Just to make sure I understood, the current BioSQL indexes are fine for this: "SELECT ... WHERE 2000 <= location.start_pos AND location.end_pos <= 5000 AND ..." but not so great for: "SELECT ... WHERE 2000 <= location.start_pos AND ..." or, "SELECT ... WHERE location.end_pos <= 5000 AND ..." Nevertheless, that should cover most usage. Having just two separated indexes on start_pos and end_pos would speed up queries on just start or end, but would slow down queries using both. Presumably having three indexes as follows would cover all these examples efficiently, but at the cost of two more indexes?: CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); CREATE INDEX seqfeatureloc_start ON location(start_pos); CREATE INDEX seqfeatureloc_start ON location(end_pos); If that is all accurate, the status quo is fine :) Regards, Peter From haili at mpiz-koeln.mpg.de Wed Aug 26 10:18:09 2009 From: haili at mpiz-koeln.mpg.de (Song Haili) Date: Wed, 26 Aug 2009 16:18:09 +0200 Subject: [BioSQL-l] error with load_ontology Message-ID: Hi All, I encountered an error message when using load_ontology.pl to load gene ontology into biosql database. The command used is: perl load_ontology.pl --driver Pg --host pg-server --dbname dbname --dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --format obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete. At the beginning,? data can be loaded with warnings, but late an exception occurred and the loading was terminated. Waring and error messages? shown below: ?--------------------- WARNING ---------------------MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS RELATED EC:2.4.1.212) (FK 20447 to Bio::Ontology::OBOterm):ERROR:? current transaction is aborted, commands ignored until end of transaction block---------------------------------------------------Could not store term GO:0050501, name 'hyaluronan synthase activity':------------- EXCEPTION -------------MSG: error while executing statement in Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:? current transaction is aborted, commands ignored until end of transaction blockSTACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK (eval) load_ontology.pl:812STACK main::persist_term load_ontology.pl:794STACK toplevel load_ontology.pl:617------------------------------------- Can you please help me to solve this problem out? Thank you very much. Best regards, song From hlapp at gmx.net Wed Aug 26 11:50:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 Aug 2009 11:50:35 -0400 Subject: [BioSQL-l] error with load_ontology In-Reply-To: References: Message-ID: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> Song, there should have been an error or warning that immediately preceded this errors. It is that one that's the root cause. Also, are you using by any chance the BioSQL version for PostgreSQL that has the RULEs removed? If yes, then at this point you cannot use any Bioperl-db scripts (or code) with it, unless you install the rules before you run such a script (and presumably remove them again afterwards). -hilmar On Aug 26, 2009, at 10:18 AM, Song Haili wrote: > Hi All, > I encountered an error message when using load_ontology.pl to load > gene ontology into biosql database. The command used is: > > perl load_ontology.pl --driver Pg --host pg-server --dbname dbname -- > dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --format > obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete. > > At the beginning, data can be loaded with warnings, but late an > exception occurred and the loading was terminated. Waring and error > messages shown below: > > --------------------- WARNING ---------------------MSG: failed to > store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS > RELATED EC:2.4.1.212) (FK 20447 to Bio::Ontology::OBOterm):ERROR: > current transaction is aborted, commands ignored until end of > transaction block--------------------------------------------------- > Could not store term GO:0050501, name 'hyaluronan synthase > activity':------------- EXCEPTION -------------MSG: error while > executing statement in > Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR: current > transaction is aborted, commands ignored until end of transaction > blockSTACK > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/ > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: > 970STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/ > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: > 873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/ > site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK > Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/ > 5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/ > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/ > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK > Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/ > 5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK (eval) > load_ontology.pl:812STACK main::persist_term load_ontology.pl: > 794STACK toplevel load_ontology.pl: > 617------------------------------------- > Can you please help me to solve this problem out? Thank you very much. > Best regards, > song > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Aug 26 11:56:25 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 Aug 2009 11:56:25 -0400 Subject: [BioSQL-l] Indexing of (seqfeature) locations? In-Reply-To: <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com> References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com> Message-ID: <48B04E04-8561-45EB-9C64-8011665A74A2@gmx.net> On Aug 26, 2009, at 8:29 AM, Peter wrote: > On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp wrote: >> >> >> On Aug 26, 2009, at 6:53 AM, Peter wrote: >> >>> The BioSQL schema includes a few indexes on the location table >>> (e.g. quoting the MySQL schema, but it looks the same on pg too): >>> >>> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); >>> [...] >>> Will these facilitate searches like this?: >>> >>> "SELECT ... WHERE 2000 <= location.start_pos >>> AND location.end_pos <= 5000 AND ..." >>> >>> Or, for this would it help to include: >>> >>> CREATE INDEX seqfeatureloc_start ON location(start_pos); >>> CREATE INDEX seqfeatureloc_start ON location(end_pos); >> >> With a decent RDBMS, having two indexes instead of a compound one >> will slow >> this query down. What the compound one won't help you with is if >> your query >> doesn't constrain the leading columns. For example, a compound >> index on >> (start_pos,end_pos) won't be used if you only constrain end_pos. If >> you want >> to do that, you need on index on (end_pos) too. > > Thanks for your reply Hilmar. Just to make sure I understood, the > current > BioSQL indexes are fine for this: > > "SELECT ... WHERE 2000 <= location.start_pos > AND location.end_pos <= 5000 AND ..." > > but not so great for: > > "SELECT ... WHERE 2000 <= location.start_pos AND ..." No, this one will work fine. (provided that start_pos comes first in the index) > > or, > > "SELECT ... WHERE location.end_pos <= 5000 AND ..." Yes. > [...] > Having just two separated indexes on start_pos and end_pos would > speed up queries on just start or end, but would slow down queries > using both. Yes (though not necessarily much), and occupy more space. > > Presumably having three indexes as follows would cover all these > examples efficiently, but at the cost of two more indexes?: > > CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); > CREATE INDEX seqfeatureloc_start ON location(start_pos); > CREATE INDEX seqfeatureloc_start ON location(end_pos); With this set, the waste of space for the compound index probably far outweighs the performance gain you might see from it. If I need to be able to constrain by both independently, I create a compound index, and separate indexes for each column after the first in the index. I.e., for the purposes of querying by start_pos, CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); CREATE INDEX seqfeatureloc_start ON location(start_pos); are redundant. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From haili at mpiz-koeln.mpg.de Thu Aug 27 03:51:07 2009 From: haili at mpiz-koeln.mpg.de (Song Haili) Date: Thu, 27 Aug 2009 09:51:07 +0200 Subject: [BioSQL-l] error with load_ontology In-Reply-To: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> References: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> Message-ID: Hi Hilmar, I loaded the data again and found that the biological process GO terms were loaded, although with some warnings: --------------------- WARNING --------------------- MSG: DBLink exists in the dblink of _default --------------------------------------------------- --------------------- WARNING --------------------- MSG: DBLink exists in the dblink of _default --------------------------------------------------- But when starting to load molecular function GO terms, process terminated with the following warnings and error message. ??????? Done with biological_process. Loading ontology molecular_function: ??????? ... terms --------------------- WARNING --------------------- MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): ERROR:? value too long for type character varying(255) --------------------------------------------------- --------------------- WARNING --------------------- MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (HAS activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): ERROR:? current transaction is aborted, commands ignored until end of transaction block --------------------------------------------------- --------------------- WARNING --------------------- MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (seHAS RELATED EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): ERROR:? current transaction is aborted, commands ignored until end of transaction block --------------------------------------------------- --------------------- WARNING --------------------- MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS RELATED EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): ERROR:? current transaction is aborted, commands ignored until end of transaction block --------------------------------------------------- Could not store term GO:0050501, name 'hyaluronan synthase activity': ------------- EXCEPTION ------------- MSG: error while executing statement in Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:? current transaction is aborted, commands ignored until end of transaction block STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195 STACK Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 STACK (eval) load_ontology.pl:812 STACK main::persist_term load_ontology.pl:794 STACK toplevel load_ontology.pl:617 ------------------------------------- ?at load_ontology.pl line 824 ??????? main::persist_term('-term', 'Bio::Ontology::OBOterm=HASH(0x96699d0)', '-db', 'Bio::DB::BioSQL::DBAdaptor=HASH(0xd90620)', '-termfactory', undef, '-throw', 'CODE(0x76ab60)', '-mergeobs', ...) called at load_ontology.pl line 617 I am using biosql-1.0.0 downloaded directly from http://www.biosql.org/wiki/Downloads without any changes. So I am not sure if the RULEs have been removed. By the way, before I met the above error, I was able to use the script load_seqdatabase.pl to load swissprot data with many warnings. song ----- Original Message ----- From: Hilmar Lapp Date: Wednesday, August 26, 2009 17:50 Subject: Re: [BioSQL-l] error with load_ontology To: Song Haili Cc: biosql-l at lists.open-bio.org > Song, > > there should have been an error or warning that immediately > preceded? > this errors. It is that one that's the root cause. > > Also, are you using by any chance the BioSQL version for > PostgreSQL? > that has the RULEs removed? If yes, then at this point you > cannot use? > any Bioperl-db scripts (or code) with it, unless you install the > rules? > before you run such a script (and presumably remove them > again? > afterwards). > > -hilmar > > On Aug 26, 2009, at 10:18 AM, Song Haili wrote: > > > Hi All, > > I encountered an error message when using load_ontology.pl to > load? > > gene ontology into biosql database. The command used is: > > > > perl load_ontology.pl --driver Pg --host pg-server --dbname > dbname -- > > dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" -- > format? > > obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete. > > > > At the beginning,? data can be loaded with warnings, but > late an? > > exception occurred and the loading was terminated. Waring and > error? > > messages? shown below: > > > >? --------------------- WARNING ---------------------MSG: > failed to? > > store term synonym (Bio::DB::BioSQL::TermAdaptor) with values > (spHAS? > > RELATED EC:2.4.1.212) (FK 20447 to > Bio::Ontology::OBOterm):ERROR:?? > > current transaction is aborted, commands ignored until end > of? > > transaction block---------------------------------------------- > ----- > > Could not store term GO:0050501, name 'hyaluronan > synthase? > > activity':------------- EXCEPTION -------------MSG: error > while? > > executing statement in? > > Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: > ERROR:? current? > > transaction is aborted, commands ignored until end of > transaction? > > blockSTACK? > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /perl/ > > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: > > 970STACK? > > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /perl/ > > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: > > 873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /perl/lib/ > > > site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK? > > Bio::DB::BioSQL::TermAdaptor::store_children > /perl/lib/site_perl/ > > 5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK? > > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /perl/lib/site_perl/ > > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK? > > Bio::DB::BioSQL::BasePersistenceAdaptor::store > /perl/lib/site_perl/ > > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK? > > Bio::DB::Persistent::PersistentObject::store > /perl/lib/site_perl/ > > 5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK > (eval)? > > load_ontology.pl:812STACK main::persist_term load_ontology.pl: > > 794STACK toplevel load_ontology.pl: > > 617------------------------------------- > > Can you please help me to solve this problem out? Thank you > very much. > > Best regards, > > song > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp? -:-? Durham, NC? -:-? hlapp > at gmx dot net : > =========================================================== > > > From biopython at maubp.freeserve.co.uk Thu Aug 27 06:24:23 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 27 Aug 2009 11:24:23 +0100 Subject: [BioSQL-l] error with load_ontology In-Reply-To: References: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> Message-ID: <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com> On Thu, Aug 27, 2009 at 8:51 AM, Song Haili wrote: > --------------------- WARNING --------------------- > MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): > ERROR:? value too long for type character varying(255) > --------------------------------------------------- Extending the relevant field in the schema might be one solution... > I am using biosql-1.0.0 downloaded directly from > http://www.biosql.org/wiki/Downloads without any changes. > So I am not sure if the RULEs have been removed. By the > way, before I met the above error, I was able to use the script > load_seqdatabase.pl to load swissprot data with many warnings. BioSQL 1.0.0 is out of date, the latest release is 1.0.1 Was that a typo? Peter From haili at mpiz-koeln.mpg.de Thu Aug 27 10:55:12 2009 From: haili at mpiz-koeln.mpg.de (Song Haili) Date: Thu, 27 Aug 2009 16:55:12 +0200 Subject: [BioSQL-l] error with load_ontology In-Reply-To: <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com> References: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com> Message-ID: Problem solved! If the file type of synonym of the table of term_synonym is changed from varchar(255) to text, there is no error occurred anymore. However this only works for biosql-1.0.0 (maybe it also works for the latest version biosql-1.0.1, but I didn't do many test). Thank you all for your help. song ----- Original Message ----- From: Peter Date: Thursday, August 27, 2009 12:24 Subject: Re: [BioSQL-l] error with load_ontology To: Song Haili Cc: Hilmar Lapp , biosql-l at lists.open-bio.org > On Thu, Aug 27, 2009 at 8:51 AM, Song Haili wrote: > > > --------------------- WARNING --------------------- > > MSG: failed to store term synonym > (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP- > alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent > hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP- > alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent > hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT > EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): > > ERROR:? value too long for type character varying(255) > > --------------------------------------------------- > > Extending the relevant field in the schema might be one solution... > > > I am using biosql-1.0.0 downloaded directly from > > http://www.biosql.org/wiki/Downloads without any changes. > > So I am not sure if the RULEs have been removed. By the > > way, before I met the above error, I was able to use the script > > load_seqdatabase.pl to load swissprot data with many warnings. > > BioSQL 1.0.0 is out of date, the latest release is 1.0.1 > Was that a typo? > > Peter From florian.mittag at uni-tuebingen.de Thu Aug 6 09:43:56 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 11:43:56 +0200 Subject: [BioSQL-l] Error when loading Gene Ontology to biosql In-Reply-To: <52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu> References: <1E596269-ED8F-4ADF-9B54-A9A0CF908620@gmx.net> <52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu> Message-ID: <200908061143.56479.florian.mittag@uni-tuebingen.de> Hi! On Friday, 24. July 2009 02:39, Chris Fields wrote: > The warning is interesting, as it derives from our rollback of feature/ > annotation stuff in bioperl. It indicates the specified DBLink is > duplicated in the Bio::Ontology::Term. > > The exception makes sense in light of that (and seems to confirm the > link was already present). I'm getting the same warnings with my custom DB2 driver and with MySQL, but the script completes successfully. I get them when loading the Gene Ontology and the Sequence Ontology. -------------------- WARNING --------------------- MSG: GOC:mah exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:12297042 exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: GOC:mah exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: GOC:rph exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:12930826 exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:15012271 exists in the dblink of _default --------------------------------------------------- [...] Done with sequence. Done, cleaning up. What to do? - Florian > > On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote: > > Hi Carlos - that's an odd error that we haven't seen yet. My first > > impulse would be to suspect that your database wasn't empty when you > > ran this, and that the error you got is due to a term in the input > > file clashing with one you already have in the database. > > > > You can check this by looking into your database: > > > > SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name = > > 'invasive growth'; > > > > Does this return anything? > > > > Note that load_ontology.pl is perfectly equipped to update an > > existing ontology - check the POD and look for the --lookup command > > line option (and the several options following it in the POD with > > which you can modify the exact update behavior). By default though > > the script will assume that it is loading a new ontology. > > > > -hilmar > > > > On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote: > >> Hi Hilmar, > >> > >> thanks for the help. I've tried now this > >> > >> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass > >> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo > >> > >> downloaded from here > >> > >> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.ob > >>o > >> > >> and I have this error message. > >> > >> --------------------- WARNING --------------------- > >> MSG: DBLink _default > >> --------------------------------------------------- > >> Could not store term GO:0001404, name 'invasive growth': > >> > >> ------------- EXCEPTION: Bio::Root::Exception ------------- > >> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to > >> be found by unique key > >> STACK: Error::throw > >> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/ > >> Root.pm:357 > >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/ > >> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 > >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ > >> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 > >> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ > >> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 > >> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ > >> load_ontology.pl:812 > >> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 > >> ----------------------------------------------------------- > >> > >> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 > >> main::persist_term('-term', > >> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db', > >> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory', > >> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at / > >> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 > >> > >> Any hints to know where the problem would be? > >> > >> Thanks in advance, > >> > >> Carlos > >> > >> Carlos Canchaya > >> ccanchaya at gmail.com > >> > >> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote: > >>> Please leave off the --fmtargs GO.defs argument - this is not a > >>> file in the .obo format. > >>> > >>> -hilmar > >>> > >>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote: > >>>> Hi guys, > >>>> > >>>> I've tried to execute load_ontologies following your suggestions as > >>>> > >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy -- > >>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format > >>>> obo gene_ontology.1_2.obo > >>>> > >>>> However I have many warnings first > >>>> > >>>> --------------------- WARNING --------------------- > >>>> MSG: DBLink exists in the dblink of _default > >>>> --------------------------------------------------- > >>>> > >>>> and then > >>>> > >>>> --------------------- WARNING --------------------- > >>>> MSG: DBLink exists in the dblink of _default > >>>> --------------------------------------------------- > >>>> Could not store term GO:0001404, name 'invasive growth': > >>>> > >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or > >>>> to be found by unique key > >>>> STACK: Error::throw > >>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/ > >>>> bioperl-live//Bio/Root/Root.pm:357 > >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 > >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 > >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 > >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ > >>>> load_ontology.pl:812 > >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 > >>>> ----------------------------------------------------------- > >>>> > >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 > >>>> main::persist_term('-term', > >>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db', > >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory', > >>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at / > >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 > >>>> > >>>> > >>>> Any ideas why? > >>>> > >>>> Thanks in advance, > >>>> > >>>> Carlos > >>>> > >>>> > >>>> Carlos Canchaya > >>>> ccanchaya at gmail.com From hlapp at gmx.net Thu Aug 6 13:46:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:46:06 -0400 Subject: [BioSQL-l] Error when loading Gene Ontology to biosql In-Reply-To: <200908061143.56479.florian.mittag@uni-tuebingen.de> References: <1E596269-ED8F-4ADF-9B54-A9A0CF908620@gmx.net> <52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu> <200908061143.56479.florian.mittag@uni-tuebingen.de> Message-ID: The warnings are fine. They simply indicates that a dbxref is being added to the term that it already had. Part of the reason for that happening may be that Bioperl-db doesn't support different kinds of dbxrefs for terms yet, if I recall correctly, so once retrieved from the database they all end up in the _default category. -hilmar On Aug 6, 2009, at 5:43 AM, Florian Mittag wrote: > Hi! > > On Friday, 24. July 2009 02:39, Chris Fields wrote: >> The warning is interesting, as it derives from our rollback of >> feature/ >> annotation stuff in bioperl. It indicates the specified DBLink is >> duplicated in the Bio::Ontology::Term. >> >> The exception makes sense in light of that (and seems to confirm the >> link was already present). > > I'm getting the same warnings with my custom DB2 driver and with > MySQL, but > the script completes successfully. I get them when loading the Gene > Ontology > and the Sequence Ontology. > > -------------------- WARNING --------------------- > MSG: GOC:mah exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: PMID:12297042 exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: GOC:mah exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: GOC:rph exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: PMID:12930826 exists in the dblink of _default > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: PMID:15012271 exists in the dblink of _default > --------------------------------------------------- > > [...] > Done with sequence. > Done, cleaning up. > > > What to do? > > - Florian > >> >> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote: >>> Hi Carlos - that's an odd error that we haven't seen yet. My first >>> impulse would be to suspect that your database wasn't empty when you >>> ran this, and that the error you got is due to a term in the input >>> file clashing with one you already have in the database. >>> >>> You can check this by looking into your database: >>> >>> SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name = >>> 'invasive growth'; >>> >>> Does this return anything? >>> >>> Note that load_ontology.pl is perfectly equipped to update an >>> existing ontology - check the POD and look for the --lookup command >>> line option (and the several options following it in the POD with >>> which you can modify the exact update behavior). By default though >>> the script will assume that it is loading a new ontology. >>> >>> -hilmar >>> >>> On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote: >>>> Hi Hilmar, >>>> >>>> thanks for the help. I've tried now this >>>> >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass >>>> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo >>>> >>>> downloaded from here >>>> >>>> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.ob >>>> o >>>> >>>> and I have this error message. >>>> >>>> --------------------- WARNING --------------------- >>>> MSG: DBLink _default >>>> --------------------------------------------------- >>>> Could not store term GO:0001404, name 'invasive growth': >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to >>>> be found by unique key >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/ >>>> Root/ >>>> Root.pm:357 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/ >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ >>>> load_ontology.pl:812 >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 >>>> ----------------------------------------------------------- >>>> >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 >>>> main::persist_term('-term', >>>> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db', >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory', >>>> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at / >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 >>>> >>>> Any hints to know where the problem would be? >>>> >>>> Thanks in advance, >>>> >>>> Carlos >>>> >>>> Carlos Canchaya >>>> ccanchaya at gmail.com >>>> >>>> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote: >>>>> Please leave off the --fmtargs GO.defs argument - this is not a >>>>> file in the .obo format. >>>>> >>>>> -hilmar >>>>> >>>>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote: >>>>>> Hi guys, >>>>>> >>>>>> I've tried to execute load_ontologies following your >>>>>> suggestions as >>>>>> >>>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy -- >>>>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format >>>>>> obo gene_ontology.1_2.obo >>>>>> >>>>>> However I have many warnings first >>>>>> >>>>>> --------------------- WARNING --------------------- >>>>>> MSG: DBLink exists in the dblink of _default >>>>>> --------------------------------------------------- >>>>>> >>>>>> and then >>>>>> >>>>>> --------------------- WARNING --------------------- >>>>>> MSG: DBLink exists in the dblink of _default >>>>>> --------------------------------------------------- >>>>>> Could not store term GO:0001404, name 'invasive growth': >>>>>> >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or >>>>>> to be found by unique key >>>>>> STACK: Error::throw >>>>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/ >>>>>> bioperl-live//Bio/Root/Root.pm:357 >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/ >>>>>> local/ >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 >>>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ >>>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 >>>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ >>>>>> load_ontology.pl:812 >>>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 >>>>>> ----------------------------------------------------------- >>>>>> >>>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 >>>>>> main::persist_term('-term', >>>>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db', >>>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory', >>>>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at / >>>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 >>>>>> >>>>>> >>>>>> Any ideas why? >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Carlos >>>>>> >>>>>> >>>>>> Carlos Canchaya >>>>>> ccanchaya at gmail.com -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florian.mittag at uni-tuebingen.de Thu Aug 6 14:20:31 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 16:20:31 +0200 Subject: [BioSQL-l] Error when loading Gene Ontology to biosql In-Reply-To: References: <200908061143.56479.florian.mittag@uni-tuebingen.de> Message-ID: <200908061620.31766.florian.mittag@uni-tuebingen.de> Ok, that's a relieve. Thanks for the quick answer! - Florian On Thursday, 6. August 2009 15:46, Hilmar Lapp wrote: > The warnings are fine. They simply indicates that a dbxref is being > added to the term that it already had. > > Part of the reason for that happening may be that Bioperl-db doesn't > support different kinds of dbxrefs for terms yet, if I recall > correctly, so once retrieved from the database they all end up in the > _default category. > > -hilmar > > On Aug 6, 2009, at 5:43 AM, Florian Mittag wrote: > > Hi! > > > > On Friday, 24. July 2009 02:39, Chris Fields wrote: > >> The warning is interesting, as it derives from our rollback of > >> feature/ > >> annotation stuff in bioperl. It indicates the specified DBLink is > >> duplicated in the Bio::Ontology::Term. > >> > >> The exception makes sense in light of that (and seems to confirm the > >> link was already present). > > > > I'm getting the same warnings with my custom DB2 driver and with > > MySQL, but > > the script completes successfully. I get them when loading the Gene > > Ontology > > and the Sequence Ontology. > > > > -------------------- WARNING --------------------- > > MSG: GOC:mah exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: PMID:12297042 exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: GOC:mah exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: GOC:rph exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: PMID:12930826 exists in the dblink of _default > > --------------------------------------------------- > > > > -------------------- WARNING --------------------- > > MSG: PMID:15012271 exists in the dblink of _default > > --------------------------------------------------- > > > > [...] > > Done with sequence. > > Done, cleaning up. > > > > > > What to do? > > > > - Florian > > > >> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote: > >>> Hi Carlos - that's an odd error that we haven't seen yet. My first > >>> impulse would be to suspect that your database wasn't empty when you > >>> ran this, and that the error you got is due to a term in the input > >>> file clashing with one you already have in the database. > >>> > >>> You can check this by looking into your database: > >>> > >>> SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name = > >>> 'invasive growth'; > >>> > >>> Does this return anything? > >>> > >>> Note that load_ontology.pl is perfectly equipped to update an > >>> existing ontology - check the POD and look for the --lookup command > >>> line option (and the several options following it in the POD with > >>> which you can modify the exact update behavior). By default though > >>> the script will assume that it is loading a new ontology. > >>> > >>> -hilmar > >>> > >>> On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote: > >>>> Hi Hilmar, > >>>> > >>>> thanks for the help. I've tried now this > >>>> > >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass > >>>> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo > >>>> > >>>> downloaded from here > >>>> > >>>> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2. > >>>>ob o > >>>> > >>>> and I have this error message. > >>>> > >>>> --------------------- WARNING --------------------- > >>>> MSG: DBLink _default > >>>> --------------------------------------------------- > >>>> Could not store term GO:0001404, name 'invasive growth': > >>>> > >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to > >>>> be found by unique key > >>>> STACK: Error::throw > >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/ > >>>> Root/ > >>>> Root.pm:357 > >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 > >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 > >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ > >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 > >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ > >>>> load_ontology.pl:812 > >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 > >>>> ----------------------------------------------------------- > >>>> > >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 > >>>> main::persist_term('-term', > >>>> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db', > >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory', > >>>> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at / > >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 > >>>> > >>>> Any hints to know where the problem would be? > >>>> > >>>> Thanks in advance, > >>>> > >>>> Carlos > >>>> > >>>> Carlos Canchaya > >>>> ccanchaya at gmail.com > >>>> > >>>> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote: > >>>>> Please leave off the --fmtargs GO.defs argument - this is not a > >>>>> file in the .obo format. > >>>>> > >>>>> -hilmar > >>>>> > >>>>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote: > >>>>>> Hi guys, > >>>>>> > >>>>>> I've tried to execute load_ontologies following your > >>>>>> suggestions as > >>>>>> > >>>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy -- > >>>>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format > >>>>>> obo gene_ontology.1_2.obo > >>>>>> > >>>>>> However I have many warnings first > >>>>>> > >>>>>> --------------------- WARNING --------------------- > >>>>>> MSG: DBLink exists in the dblink of _default > >>>>>> --------------------------------------------------- > >>>>>> > >>>>>> and then > >>>>>> > >>>>>> --------------------- WARNING --------------------- > >>>>>> MSG: DBLink exists in the dblink of _default > >>>>>> --------------------------------------------------- > >>>>>> Could not store term GO:0001404, name 'invasive growth': > >>>>>> > >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or > >>>>>> to be found by unique key > >>>>>> STACK: Error::throw > >>>>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/ > >>>>>> bioperl-live//Bio/Root/Root.pm:357 > >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/ > >>>>>> local/ > >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219 > >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/ > >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 > >>>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/ > >>>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 > >>>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/ > >>>>>> load_ontology.pl:812 > >>>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617 > >>>>>> ----------------------------------------------------------- > >>>>>> > >>>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824 > >>>>>> main::persist_term('-term', > >>>>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db', > >>>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory', > >>>>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at / > >>>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617 > >>>>>> > >>>>>> > >>>>>> Any ideas why? > >>>>>> > >>>>>> Thanks in advance, > >>>>>> > >>>>>> Carlos > >>>>>> > >>>>>> > >>>>>> Carlos Canchaya > >>>>>> ccanchaya at gmail.com -- Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076 Tuebingen, Germany Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 From haili at mpiz-koeln.mpg.de Mon Aug 10 14:21:39 2009 From: haili at mpiz-koeln.mpg.de (Song Haili) Date: Mon, 10 Aug 2009 16:21:39 +0200 Subject: [BioSQL-l] how to load other data to biosql database? Message-ID: Dear all, Does any of you know how to load other data, such as domain, EC number, Mapman bins, Interaction , Kegg Ontology etc, into biosql database? Is it possible by using load_ontology.pl? If it is, what are the corresponding arguments? Otherwise, should I write my own scripts? Any suggestion will be highly appreciated! Best regards, song From florian.mittag at uni-tuebingen.de Tue Aug 11 08:10:12 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Tue, 11 Aug 2009 10:10:12 +0200 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? Message-ID: <200908111010.12143.florian.mittag@uni-tuebingen.de> Hi! I stumbled upon an old post from Hilmar: On Tue, 18 Mar 2003, Hilmar Lapp wrote: > type_term_id is supposed to reference an SO term. source is supposed to > denote the 'method' (BLAST, BLAT, sim4, genewise, whatnot), as far as > my understanding goes. In the case of reading the features from a > GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the source > (which is what the genbank, embl, and swissprot parsers do in bioperl) > is maybe stretching the definition, but I don't have something > substantially better to offer. I inspected the database after I imported some Genbank files with BioJava, and I found that the source_term_id for the seqfeatures is always set to the ID of an automatically inserted term "Genbank" with definition "auto-generated by biojavax". I was wondering if there is anything new to the source_term_id. - Florian From florian.mittag at uni-tuebingen.de Tue Aug 11 09:09:50 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Tue, 11 Aug 2009 11:09:50 +0200 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> Message-ID: <200908111109.50361.florian.mittag@uni-tuebingen.de> Hm, I should've mentioned my real concern. We're integrating all kinds of data into the database and right now I want to import miRNA information (sequences and target sites) from miRBase (http://microrna.sanger.ac.uk/sequences/). The files I download from there specify "miRanda" as METHOD, so should I use this as source term or miRBase? Thanks, - Florian On Tuesday, 11. August 2009 10:59, Richard Holland wrote: > The reason BJX does that is because the Genbank format has no > indication of where a feature came from. So, all there is to go on is > that it came from Genbank! This allows us to differentiate between > features on a sequence that were loaded from an original file, and new > features that have been added to the sequence in the db after it was > loaded (e.g. by running blast, blat etc. against some local data). > > On 11 Aug 2009, at 09:10, Florian Mittag wrote: > > Hi! > > > > I stumbled upon an old post from Hilmar: > > > > On Tue, 18 Mar 2003, Hilmar Lapp wrote: > >> type_term_id is supposed to reference an SO term. source is > >> supposed to > >> denote the 'method' (BLAST, BLAT, sim4, genewise, whatnot), as far > >> as > >> my understanding goes. In the case of reading the features from a > >> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the > >> source > >> (which is what the genbank, embl, and swissprot parsers do in > >> bioperl) > >> is maybe stretching the definition, but I don't have something > >> substantially better to offer. > > > > I inspected the database after I imported some Genbank files with > > BioJava, and > > I found that the source_term_id for the seqfeatures is always set to > > the ID > > of an automatically inserted term "Genbank" with definition "auto- > > generated > > by biojavax". > > > > I was wondering if there is anything new to the source_term_id. > > > > - Florian > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ -- Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076 Tuebingen, Germany Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 From holland at eaglegenomics.com Tue Aug 11 09:22:41 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 11 Aug 2009 10:22:41 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <200908111109.50361.florian.mittag@uni-tuebingen.de> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> Message-ID: <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> Ideally there would be two fields for source_term_id - one for the algorithm used to generate the data (e.g. BLAST, miRanda), the other for the source the data came from (e.g. Genbank, miRBase). These are two very distinct concepts and it is not easy to represent them successfully using a single ontology source_term_id field. So the only way round it if you need to represent both algorithm and source is to create your own ontology which is a cross-product of the two possible sets of values (triples would be good for this). If you want to use only a single term, basically it's up to you whether you choose to annotate by algorithm (miRanda) or by source (miRBase). I expect the decision will rest on whether it is more important for you to know which features in your database were added locally and which came from a remote source, or if knowing the algorithm used to generate them is more important. Otherwise if both are important the cross-product triple approach is probably the only way to go. cheers, Richard On 11 Aug 2009, at 10:09, Florian Mittag wrote: > Hm, I should've mentioned my real concern. We're integrating all > kinds of data > into the database and right now I want to import miRNA information > (sequences > and target sites) from miRBase (http://microrna.sanger.ac.uk/sequences/ > ). > The files I download from there specify "miRanda" as METHOD, so > should I use > this as source term or miRBase? > > Thanks, > - Florian > > On Tuesday, 11. August 2009 10:59, Richard Holland wrote: >> The reason BJX does that is because the Genbank format has no >> indication of where a feature came from. So, all there is to go on is >> that it came from Genbank! This allows us to differentiate between >> features on a sequence that were loaded from an original file, and >> new >> features that have been added to the sequence in the db after it was >> loaded (e.g. by running blast, blat etc. against some local data). >> >> On 11 Aug 2009, at 09:10, Florian Mittag wrote: >>> Hi! >>> >>> I stumbled upon an old post from Hilmar: >>> >>> On Tue, 18 Mar 2003, Hilmar Lapp wrote: >>>> type_term_id is supposed to reference an SO term. source is >>>> supposed to >>>> denote the 'method' (BLAST, BLAT, sim4, genewise, whatnot), as far >>>> as >>>> my understanding goes. In the case of reading the features from a >>>> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the >>>> source >>>> (which is what the genbank, embl, and swissprot parsers do in >>>> bioperl) >>>> is maybe stretching the definition, but I don't have something >>>> substantially better to offer. >>> >>> I inspected the database after I imported some Genbank files with >>> BioJava, and >>> I found that the source_term_id for the seqfeatures is always set to >>> the ID >>> of an automatically inserted term "Genbank" with definition "auto- >>> generated >>> by biojavax". >>> >>> I was wondering if there is anything new to the source_term_id. >>> >>> - Florian >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ > > -- > Dipl. Inf. Florian Mittag > Universit?t Tuebingen > WSI-RA, Sand 1 > 72076 Tuebingen, Germany > Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Tue Aug 11 08:59:27 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 11 Aug 2009 09:59:27 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <200908111010.12143.florian.mittag@uni-tuebingen.de> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> Message-ID: <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> The reason BJX does that is because the Genbank format has no indication of where a feature came from. So, all there is to go on is that it came from Genbank! This allows us to differentiate between features on a sequence that were loaded from an original file, and new features that have been added to the sequence in the db after it was loaded (e.g. by running blast, blat etc. against some local data). On 11 Aug 2009, at 09:10, Florian Mittag wrote: > Hi! > > I stumbled upon an old post from Hilmar: > > On Tue, 18 Mar 2003, Hilmar Lapp wrote: >> type_term_id is supposed to reference an SO term. source is >> supposed to >> denote the 'method' (BLAST, BLAT, sim4, genewise, whatnot), as far >> as >> my understanding goes. In the case of reading the features from a >> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the >> source >> (which is what the genbank, embl, and swissprot parsers do in >> bioperl) >> is maybe stretching the definition, but I don't have something >> substantially better to offer. > > I inspected the database after I imported some Genbank files with > BioJava, and > I found that the source_term_id for the seqfeatures is always set to > the ID > of an automatically inserted term "Genbank" with definition "auto- > generated > by biojavax". > > I was wondering if there is anything new to the source_term_id. > > - Florian > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Fri Aug 14 22:56:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 14 Aug 2009 18:56:11 -0400 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> Message-ID: <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> On Aug 11, 2009, at 5:22 AM, Richard Holland wrote: > Ideally there would be two fields for source_term_id - one for the > algorithm used to generate the data (e.g. BLAST, miRanda), the other > for the source the data came from (e.g. Genbank, miRBase). You mean the source of the data that it was applied to. I agree though that if you want both you can create a cross-product term and store the decomposition as term_relationship's. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at eaglegenomics.com Sat Aug 15 10:44:16 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 15 Aug 2009 11:44:16 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> Message-ID: <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> On 14 Aug 2009, at 23:56, Hilmar Lapp wrote: > > On Aug 11, 2009, at 5:22 AM, Richard Holland wrote: > >> Ideally there would be two fields for source_term_id - one for the >> algorithm used to generate the data (e.g. BLAST, miRanda), the >> other for the source the data came from (e.g. Genbank, miRBase). > > > You mean the source of the data that it was applied to. Not necessarily. The source of the data that it was applied to (ie. the sequence the feature refers to) is a third thing - and that is an attribute of the sequence the feature refers to, rather than the feature itself. What I mean is this: 1. The sequence itself could be downloaded from Genbank, EMBL, or elsewhere, or I could have discovered it in-house. 2. The features on the sequence could have been generated by running BLAST, miRBase, etc., or they could be manually annotated. 3. The features on the sequence could have been downloaded from Genbank, EMBL, etc., or they could have been made locally, or by a collaborator at another institute. To my mind these are three distinct things. (1) is sequence-related, and (2) and (3) are feature-related. cheers, Richard > I agree though that if you want both you can create a cross-product > term and store the decomposition as term_relationship's. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Sat Aug 15 14:29:24 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 10:29:24 -0400 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> Message-ID: <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> On Aug 15, 2009, at 6:44 AM, Richard Holland wrote: > [...] > What I mean is this: > > 1. The sequence itself could be downloaded from Genbank, EMBL, or > elsewhere, or I could have discovered it in-house. That's actually what I meant. > 2. The features on the sequence could have been generated by > running BLAST, miRBase, etc., or they could be manually annotated. > 3. The features on the sequence could have been downloaded from > Genbank, EMBL, etc., or they could have been made locally, or by a > collaborator at another institute. Right, but if a feature is the result of you running some algorithm against some sequences, then it's not been downloaded or given to you. Features on one and the same sequence can have different sources, obviously, so I'm a bit confused - I think we're talking about the same thing in different words, but I'm not sure. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at eaglegenomics.com Sat Aug 15 16:32:35 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 15 Aug 2009 17:32:35 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> Message-ID: <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> On 15 Aug 2009, at 15:29, Hilmar Lapp wrote: > > On Aug 15, 2009, at 6:44 AM, Richard Holland wrote: > >> [...] >> What I mean is this: >> >> 1. The sequence itself could be downloaded from Genbank, EMBL, or >> elsewhere, or I could have discovered it in-house. > > That's actually what I meant. > >> 2. The features on the sequence could have been generated by >> running BLAST, miRBase, etc., or they could be manually annotated. >> 3. The features on the sequence could have been downloaded from >> Genbank, EMBL, etc., or they could have been made locally, or by a >> collaborator at another institute. > > Right, but if a feature is the result of you running some algorithm > against some sequences, then it's not been downloaded or given to > you. Features on one and the same sequence can have different > sources, obviously, so I'm a bit confused - I think we're talking > about the same thing in different words, but I'm not sure. Probably. :) Case study: I download some seqs from Genbank. (Which then need to be annotated as having come from Genbank, at the sequence level). They already have some features on them (which need to be annotated as having come from Genbank, at the feature level, but of an unknown algorithm as Genbank doesn't specify how they were generated usually). I then run BLAST of those sequences against some local data, and record my own features as a result. I also run BLAT, and again record my own features. My colleague also runs BLAST of the same seqs against some data of his own, and wants our combined feature results to be stored in the same database. I want to be able to annotate all these new features both with the algorithm used to generate them (BLAST or BLAT) and who did it (myself or my colleague at the institute down the road), in addition to retaining the original features that came from Genbank (and making sure they're annotated as such). Hence I'd need a source attribute for the sequence (Genbank in this case), a source attribute for each feature (Genbank, Me, or Colleague X, in this case), and an algorithm/technique/protocol attribute for each feature (BLAST or BLAT or 'don't know it just came from Genbank' in this example). cheers, Richard > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Sat Aug 15 19:31:13 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 15:31:13 -0400 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> Message-ID: <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net> On Aug 15, 2009, at 12:32 PM, Richard Holland wrote: > [...] > Case study: Great, now we're getting somewhere :-) > I download some seqs from Genbank. (Which then need to be annotated > as having come from Genbank, at the sequence level). Note, as you say, *at the sequence level*. I.e., you would record this either using the bioentry's namespace (biodatabase), or a bioentry_qualifier_value annotation. I would choose the former, though since a bioentry can on only be in one namespace, it may not satisfy your needs. > They already have some features on them (which need to be annotated > as having come from Genbank, at the feature level, but of an unknown > algorithm as Genbank doesn't specify how they were generated usually). Right. The source term would indicate that GenBank provided them to you, and that that's all you know. > I then run BLAST of those sequences against some local data, and > record my own features as a result. I also run BLAT, and again > record my own features. BLAST and BLAT would now be the source terms. > My colleague also runs BLAST of the same seqs against some data of > his own, and wants our combined feature results to be stored in the > same database. I want to be able to annotate all these new features > both with the algorithm used to generate them (BLAST or BLAT) You use the source term for that. > and who did it (myself or my colleague at the institute down the road) Ah - that's provenance information, not the source as is normally referred to. BioSQL at present doesn't have an explicit provenance model, but you can still record provenance information through ontology-typed tag/value annotation in seqfeature_qualifier_value, with the terms coming from a provenance ontology (that you make up yourself or grab from somewhere else). > , in addition to retaining the original features that came from > Genbank (and making sure they're annotated as such). That shouldn't be a problem - certainly it's not for BioSQL. > Hence I'd need a source attribute for the sequence (Genbank in this > case), a source attribute for each feature (Genbank, Me, or > Colleague X, in this case), and an algorithm/technique/protocol > attribute for each feature (BLAST or BLAT or 'don't know it just > came from Genbank' in this example). Not quite - source really is what provided the feature to you, not who or when, or using which BLAST database, genome assembly, or how you parsed the results, etc etc. That's all provenance information. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at eaglegenomics.com Sat Aug 15 20:00:39 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 15 Aug 2009 21:00:39 +0100 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net> Message-ID: <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com> Ok, cool. So we can now rephrase the original question to...: How should provenance information be stored in BioSQL? :) cheers, Richard On 15 Aug 2009, at 20:31, Hilmar Lapp wrote: > > On Aug 15, 2009, at 12:32 PM, Richard Holland wrote: > >> [...] >> Case study: > > Great, now we're getting somewhere :-) > >> I download some seqs from Genbank. (Which then need to be annotated >> as having come from Genbank, at the sequence level). > > Note, as you say, *at the sequence level*. I.e., you would record > this either using the bioentry's namespace (biodatabase), or a > bioentry_qualifier_value annotation. I would choose the former, > though since a bioentry can on only be in one namespace, it may not > satisfy your needs. > >> They already have some features on them (which need to be annotated >> as having come from Genbank, at the feature level, but of an >> unknown algorithm as Genbank doesn't specify how they were >> generated usually). > > Right. The source term would indicate that GenBank provided them to > you, and that that's all you know. > >> I then run BLAST of those sequences against some local data, and >> record my own features as a result. I also run BLAT, and again >> record my own features. > > BLAST and BLAT would now be the source terms. > >> My colleague also runs BLAST of the same seqs against some data of >> his own, and wants our combined feature results to be stored in the >> same database. I want to be able to annotate all these new features >> both with the algorithm used to generate them (BLAST or BLAT) > > You use the source term for that. > >> and who did it (myself or my colleague at the institute down the >> road) > > Ah - that's provenance information, not the source as is normally > referred to. BioSQL at present doesn't have an explicit provenance > model, but you can still record provenance information through > ontology-typed tag/value annotation in seqfeature_qualifier_value, > with the terms coming from a provenance ontology (that you make up > yourself or grab from somewhere else). > >> , in addition to retaining the original features that came from >> Genbank (and making sure they're annotated as such). > > That shouldn't be a problem - certainly it's not for BioSQL. > >> Hence I'd need a source attribute for the sequence (Genbank in this >> case), a source attribute for each feature (Genbank, Me, or >> Colleague X, in this case), and an algorithm/technique/protocol >> attribute for each feature (BLAST or BLAT or 'don't know it just >> came from Genbank' in this example). > > Not quite - source really is what provided the feature to you, not > who or when, or using which BLAST database, genome assembly, or how > you parsed the results, etc etc. That's all provenance information. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Sat Aug 15 20:14:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:14:54 -0400 Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to? In-Reply-To: <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com> References: <200908111010.12143.florian.mittag@uni-tuebingen.de> <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com> <200908111109.50361.florian.mittag@uni-tuebingen.de> <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com> <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net> <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com> <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net> <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com> <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net> <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com> Message-ID: <92DD5E74-5638-4CB8-B34A-3282AACF036A@gmx.net> On Aug 15, 2009, at 4:00 PM, Richard Holland wrote: > Ok, cool. So we can now rephrase the original question to...: How > should provenance information be stored in BioSQL? Yes, and the answer is using a provenance ontology or controlled vocabulary and bioentry_qualifier_value and seqfeature_qualifier_value. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Wed Aug 26 10:53:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Aug 2009 11:53:40 +0100 Subject: [BioSQL-l] Indexing of (seqfeature) locations? Message-ID: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> Hi BioSQL folks, The BioSQL schema includes a few indexes on the location table (e.g. quoting the MySQL schema, but it looks the same on pg too): CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); CREATE INDEX seqfeatureloc_dbx ON location(dbxref_id); CREATE INDEX seqfeatureloc_trm ON location(term_id); Will these facilitate searches like this?: "SELECT ... WHERE 2000 <= location.start_pos AND location.end_pos <= 5000 AND ..." Or, for this would it help to include: CREATE INDEX seqfeatureloc_start ON location(start_pos); CREATE INDEX seqfeatureloc_start ON location(end_pos); A motivational use case would be to pull out an operon, or a region of a record as part of a genome browser. Thanks, Peter From hlapp at gmx.net Wed Aug 26 12:07:08 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 Aug 2009 08:07:08 -0400 Subject: [BioSQL-l] Indexing of (seqfeature) locations? In-Reply-To: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> Message-ID: On Aug 26, 2009, at 6:53 AM, Peter wrote: > The BioSQL schema includes a few indexes on the location table > (e.g. quoting the MySQL schema, but it looks the same on pg too): > > CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); > [...] > Will these facilitate searches like this?: > > "SELECT ... WHERE 2000 <= location.start_pos > AND location.end_pos <= 5000 AND ..." > > Or, for this would it help to include: > > CREATE INDEX seqfeatureloc_start ON location(start_pos); > CREATE INDEX seqfeatureloc_start ON location(end_pos); With a decent RDBMS, having two indexes instead of a compound one will slow this query down. What the compound one won't help you with is if your query doesn't constrain the leading columns. For example, a compound index on (start_pos,end_pos) won't be used if you only constrain end_pos. If you want to do that, you need on index on (end_pos) too. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From biopython at maubp.freeserve.co.uk Wed Aug 26 12:29:56 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 26 Aug 2009 13:29:56 +0100 Subject: [BioSQL-l] Indexing of (seqfeature) locations? In-Reply-To: References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> Message-ID: <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com> On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp wrote: > > > On Aug 26, 2009, at 6:53 AM, Peter wrote: > >> The BioSQL schema includes a few indexes on the location table >> (e.g. quoting the MySQL schema, but it looks the same on pg too): >> >> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); >> [...] >> Will these facilitate searches like this?: >> >> "SELECT ... WHERE 2000 <= location.start_pos >> AND location.end_pos <= 5000 AND ..." >> >> Or, for this would it help to include: >> >> CREATE INDEX seqfeatureloc_start ON location(start_pos); >> CREATE INDEX seqfeatureloc_start ON location(end_pos); > > With a decent RDBMS, having two indexes instead of a compound one will slow > this query down. What the compound one won't help you with is if your query > doesn't constrain the leading columns. For example, a compound index on > (start_pos,end_pos) won't be used if you only constrain end_pos. If you want > to do that, you need on index on (end_pos) too. Thanks for your reply Hilmar. Just to make sure I understood, the current BioSQL indexes are fine for this: "SELECT ... WHERE 2000 <= location.start_pos AND location.end_pos <= 5000 AND ..." but not so great for: "SELECT ... WHERE 2000 <= location.start_pos AND ..." or, "SELECT ... WHERE location.end_pos <= 5000 AND ..." Nevertheless, that should cover most usage. Having just two separated indexes on start_pos and end_pos would speed up queries on just start or end, but would slow down queries using both. Presumably having three indexes as follows would cover all these examples efficiently, but at the cost of two more indexes?: CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); CREATE INDEX seqfeatureloc_start ON location(start_pos); CREATE INDEX seqfeatureloc_start ON location(end_pos); If that is all accurate, the status quo is fine :) Regards, Peter From haili at mpiz-koeln.mpg.de Wed Aug 26 14:18:09 2009 From: haili at mpiz-koeln.mpg.de (Song Haili) Date: Wed, 26 Aug 2009 16:18:09 +0200 Subject: [BioSQL-l] error with load_ontology Message-ID: Hi All, I encountered an error message when using load_ontology.pl to load gene ontology into biosql database. The command used is: perl load_ontology.pl --driver Pg --host pg-server --dbname dbname --dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --format obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete. At the beginning,? data can be loaded with warnings, but late an exception occurred and the loading was terminated. Waring and error messages? shown below: ?--------------------- WARNING ---------------------MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS RELATED EC:2.4.1.212) (FK 20447 to Bio::Ontology::OBOterm):ERROR:? current transaction is aborted, commands ignored until end of transaction block---------------------------------------------------Could not store term GO:0050501, name 'hyaluronan synthase activity':------------- EXCEPTION -------------MSG: error while executing statement in Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:? current transaction is aborted, commands ignored until end of transaction blockSTACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK (eval) load_ontology.pl:812STACK main::persist_term load_ontology.pl:794STACK toplevel load_ontology.pl:617------------------------------------- Can you please help me to solve this problem out? Thank you very much. Best regards, song From hlapp at gmx.net Wed Aug 26 15:50:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 Aug 2009 11:50:35 -0400 Subject: [BioSQL-l] error with load_ontology In-Reply-To: References: Message-ID: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> Song, there should have been an error or warning that immediately preceded this errors. It is that one that's the root cause. Also, are you using by any chance the BioSQL version for PostgreSQL that has the RULEs removed? If yes, then at this point you cannot use any Bioperl-db scripts (or code) with it, unless you install the rules before you run such a script (and presumably remove them again afterwards). -hilmar On Aug 26, 2009, at 10:18 AM, Song Haili wrote: > Hi All, > I encountered an error message when using load_ontology.pl to load > gene ontology into biosql database. The command used is: > > perl load_ontology.pl --driver Pg --host pg-server --dbname dbname -- > dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --format > obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete. > > At the beginning, data can be loaded with warnings, but late an > exception occurred and the loading was terminated. Waring and error > messages shown below: > > --------------------- WARNING ---------------------MSG: failed to > store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS > RELATED EC:2.4.1.212) (FK 20447 to Bio::Ontology::OBOterm):ERROR: > current transaction is aborted, commands ignored until end of > transaction block--------------------------------------------------- > Could not store term GO:0050501, name 'hyaluronan synthase > activity':------------- EXCEPTION -------------MSG: error while > executing statement in > Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR: current > transaction is aborted, commands ignored until end of transaction > blockSTACK > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/ > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: > 970STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/ > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: > 873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/ > site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK > Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/ > 5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/ > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/ > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK > Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/ > 5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK (eval) > load_ontology.pl:812STACK main::persist_term load_ontology.pl: > 794STACK toplevel load_ontology.pl: > 617------------------------------------- > Can you please help me to solve this problem out? Thank you very much. > Best regards, > song > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Wed Aug 26 15:56:25 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 26 Aug 2009 11:56:25 -0400 Subject: [BioSQL-l] Indexing of (seqfeature) locations? In-Reply-To: <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com> References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com> <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com> Message-ID: <48B04E04-8561-45EB-9C64-8011665A74A2@gmx.net> On Aug 26, 2009, at 8:29 AM, Peter wrote: > On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp wrote: >> >> >> On Aug 26, 2009, at 6:53 AM, Peter wrote: >> >>> The BioSQL schema includes a few indexes on the location table >>> (e.g. quoting the MySQL schema, but it looks the same on pg too): >>> >>> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); >>> [...] >>> Will these facilitate searches like this?: >>> >>> "SELECT ... WHERE 2000 <= location.start_pos >>> AND location.end_pos <= 5000 AND ..." >>> >>> Or, for this would it help to include: >>> >>> CREATE INDEX seqfeatureloc_start ON location(start_pos); >>> CREATE INDEX seqfeatureloc_start ON location(end_pos); >> >> With a decent RDBMS, having two indexes instead of a compound one >> will slow >> this query down. What the compound one won't help you with is if >> your query >> doesn't constrain the leading columns. For example, a compound >> index on >> (start_pos,end_pos) won't be used if you only constrain end_pos. If >> you want >> to do that, you need on index on (end_pos) too. > > Thanks for your reply Hilmar. Just to make sure I understood, the > current > BioSQL indexes are fine for this: > > "SELECT ... WHERE 2000 <= location.start_pos > AND location.end_pos <= 5000 AND ..." > > but not so great for: > > "SELECT ... WHERE 2000 <= location.start_pos AND ..." No, this one will work fine. (provided that start_pos comes first in the index) > > or, > > "SELECT ... WHERE location.end_pos <= 5000 AND ..." Yes. > [...] > Having just two separated indexes on start_pos and end_pos would > speed up queries on just start or end, but would slow down queries > using both. Yes (though not necessarily much), and occupy more space. > > Presumably having three indexes as follows would cover all these > examples efficiently, but at the cost of two more indexes?: > > CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); > CREATE INDEX seqfeatureloc_start ON location(start_pos); > CREATE INDEX seqfeatureloc_start ON location(end_pos); With this set, the waste of space for the compound index probably far outweighs the performance gain you might see from it. If I need to be able to constrain by both independently, I create a compound index, and separate indexes for each column after the first in the index. I.e., for the purposes of querying by start_pos, CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos); CREATE INDEX seqfeatureloc_start ON location(start_pos); are redundant. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From haili at mpiz-koeln.mpg.de Thu Aug 27 07:51:07 2009 From: haili at mpiz-koeln.mpg.de (Song Haili) Date: Thu, 27 Aug 2009 09:51:07 +0200 Subject: [BioSQL-l] error with load_ontology In-Reply-To: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> References: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> Message-ID: Hi Hilmar, I loaded the data again and found that the biological process GO terms were loaded, although with some warnings: --------------------- WARNING --------------------- MSG: DBLink exists in the dblink of _default --------------------------------------------------- --------------------- WARNING --------------------- MSG: DBLink exists in the dblink of _default --------------------------------------------------- But when starting to load molecular function GO terms, process terminated with the following warnings and error message. ??????? Done with biological_process. Loading ontology molecular_function: ??????? ... terms --------------------- WARNING --------------------- MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): ERROR:? value too long for type character varying(255) --------------------------------------------------- --------------------- WARNING --------------------- MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (HAS activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): ERROR:? current transaction is aborted, commands ignored until end of transaction block --------------------------------------------------- --------------------- WARNING --------------------- MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (seHAS RELATED EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): ERROR:? current transaction is aborted, commands ignored until end of transaction block --------------------------------------------------- --------------------- WARNING --------------------- MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS RELATED EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): ERROR:? current transaction is aborted, commands ignored until end of transaction block --------------------------------------------------- Could not store term GO:0050501, name 'hyaluronan synthase activity': ------------- EXCEPTION ------------- MSG: error while executing statement in Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:? current transaction is aborted, commands ignored until end of transaction block STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195 STACK Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264 STACK Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284 STACK (eval) load_ontology.pl:812 STACK main::persist_term load_ontology.pl:794 STACK toplevel load_ontology.pl:617 ------------------------------------- ?at load_ontology.pl line 824 ??????? main::persist_term('-term', 'Bio::Ontology::OBOterm=HASH(0x96699d0)', '-db', 'Bio::DB::BioSQL::DBAdaptor=HASH(0xd90620)', '-termfactory', undef, '-throw', 'CODE(0x76ab60)', '-mergeobs', ...) called at load_ontology.pl line 617 I am using biosql-1.0.0 downloaded directly from http://www.biosql.org/wiki/Downloads without any changes. So I am not sure if the RULEs have been removed. By the way, before I met the above error, I was able to use the script load_seqdatabase.pl to load swissprot data with many warnings. song ----- Original Message ----- From: Hilmar Lapp Date: Wednesday, August 26, 2009 17:50 Subject: Re: [BioSQL-l] error with load_ontology To: Song Haili Cc: biosql-l at lists.open-bio.org > Song, > > there should have been an error or warning that immediately > preceded? > this errors. It is that one that's the root cause. > > Also, are you using by any chance the BioSQL version for > PostgreSQL? > that has the RULEs removed? If yes, then at this point you > cannot use? > any Bioperl-db scripts (or code) with it, unless you install the > rules? > before you run such a script (and presumably remove them > again? > afterwards). > > -hilmar > > On Aug 26, 2009, at 10:18 AM, Song Haili wrote: > > > Hi All, > > I encountered an error message when using load_ontology.pl to > load? > > gene ontology into biosql database. The command used is: > > > > perl load_ontology.pl --driver Pg --host pg-server --dbname > dbname -- > > dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" -- > format? > > obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete. > > > > At the beginning,? data can be loaded with warnings, but > late an? > > exception occurred and the loading was terminated. Waring and > error? > > messages? shown below: > > > >? --------------------- WARNING ---------------------MSG: > failed to? > > store term synonym (Bio::DB::BioSQL::TermAdaptor) with values > (spHAS? > > RELATED EC:2.4.1.212) (FK 20447 to > Bio::Ontology::OBOterm):ERROR:?? > > current transaction is aborted, commands ignored until end > of? > > transaction block---------------------------------------------- > ----- > > Could not store term GO:0050501, name 'hyaluronan > synthase? > > activity':------------- EXCEPTION -------------MSG: error > while? > > executing statement in? > > Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: > ERROR:? current? > > transaction is aborted, commands ignored until end of > transaction? > > blockSTACK? > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /perl/ > > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: > > 970STACK? > > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /perl/ > > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: > > 873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /perl/lib/ > > > site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK? > > Bio::DB::BioSQL::TermAdaptor::store_children > /perl/lib/site_perl/ > > 5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK? > > Bio::DB::BioSQL::BasePersistenceAdaptor::create > /perl/lib/site_perl/ > > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK? > > Bio::DB::BioSQL::BasePersistenceAdaptor::store > /perl/lib/site_perl/ > > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK? > > Bio::DB::Persistent::PersistentObject::store > /perl/lib/site_perl/ > > 5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK > (eval)? > > load_ontology.pl:812STACK main::persist_term load_ontology.pl: > > 794STACK toplevel load_ontology.pl: > > 617------------------------------------- > > Can you please help me to solve this problem out? Thank you > very much. > > Best regards, > > song > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp? -:-? Durham, NC? -:-? hlapp > at gmx dot net : > =========================================================== > > > From biopython at maubp.freeserve.co.uk Thu Aug 27 10:24:23 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 27 Aug 2009 11:24:23 +0100 Subject: [BioSQL-l] error with load_ontology In-Reply-To: References: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> Message-ID: <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com> On Thu, Aug 27, 2009 at 8:51 AM, Song Haili wrote: > --------------------- WARNING --------------------- > MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): > ERROR:? value too long for type character varying(255) > --------------------------------------------------- Extending the relevant field in the schema might be one solution... > I am using biosql-1.0.0 downloaded directly from > http://www.biosql.org/wiki/Downloads without any changes. > So I am not sure if the RULEs have been removed. By the > way, before I met the above error, I was able to use the script > load_seqdatabase.pl to load swissprot data with many warnings. BioSQL 1.0.0 is out of date, the latest release is 1.0.1 Was that a typo? Peter From haili at mpiz-koeln.mpg.de Thu Aug 27 14:55:12 2009 From: haili at mpiz-koeln.mpg.de (Song Haili) Date: Thu, 27 Aug 2009 16:55:12 +0200 Subject: [BioSQL-l] error with load_ontology In-Reply-To: <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com> References: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net> <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com> Message-ID: Problem solved! If the file type of synonym of the table of term_synonym is changed from varchar(255) to text, there is no error occurred anymore. However this only works for biosql-1.0.0 (maybe it also works for the latest version biosql-1.0.1, but I didn't do many test). Thank you all for your help. song ----- Original Message ----- From: Peter Date: Thursday, August 27, 2009 12:24 Subject: Re: [BioSQL-l] error with load_ontology To: Song Haili Cc: Hilmar Lapp , biosql-l at lists.open-bio.org > On Thu, Aug 27, 2009 at 8:51 AM, Song Haili wrote: > > > --------------------- WARNING --------------------- > > MSG: failed to store term synonym > (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP- > alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent > hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP- > alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent > hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT > EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm): > > ERROR:? value too long for type character varying(255) > > --------------------------------------------------- > > Extending the relevant field in the schema might be one solution... > > > I am using biosql-1.0.0 downloaded directly from > > http://www.biosql.org/wiki/Downloads without any changes. > > So I am not sure if the RULEs have been removed. By the > > way, before I met the above error, I was able to use the script > > load_seqdatabase.pl to load swissprot data with many warnings. > > BioSQL 1.0.0 is out of date, the latest release is 1.0.1 > Was that a typo? > > Peter