From florian.mittag at uni-tuebingen.de  Thu Aug  6 05:43:56 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 6 Aug 2009 11:43:56 +0200
Subject: [BioSQL-l] Error when loading Gene Ontology to biosql
In-Reply-To: <52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu>
References: <fc0bfd871e72.4a65b072@mpiz-koeln.mpg.de>
	<1E596269-ED8F-4ADF-9B54-A9A0CF908620@gmx.net>
	<52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu>
Message-ID: <200908061143.56479.florian.mittag@uni-tuebingen.de>

Hi!

On Friday, 24. July 2009 02:39, Chris Fields wrote:
> The warning is interesting, as it derives from our rollback of feature/
> annotation stuff in bioperl.  It indicates the specified DBLink is
> duplicated in the Bio::Ontology::Term.
>
> The exception makes sense in light of that (and seems to confirm the
> link was already present).

I'm getting the same warnings with my custom DB2 driver and with MySQL, but 
the script completes successfully. I get them when loading the Gene Ontology 
and the Sequence Ontology.

-------------------- WARNING ---------------------
MSG: GOC:mah exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: PMID:12297042 exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: GOC:mah exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: GOC:rph exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: PMID:12930826 exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: PMID:15012271 exists in the dblink of _default
---------------------------------------------------

[...]
Done with sequence.
Done, cleaning up.


What to do?

- Florian

>
> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote:
> > Hi Carlos - that's an odd error that we haven't seen yet. My first
> > impulse would be to suspect that your database wasn't empty when you
> > ran this, and that the error you got is due to a term in the input
> > file clashing with one you already have in the database.
> >
> > You can check this by looking into your database:
> >
> > SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name =
> > 'invasive growth';
> >
> > Does this return anything?
> >
> > Note that load_ontology.pl is perfectly equipped to update an
> > existing ontology - check the POD and look for the --lookup command
> > line option (and the several options following it in the POD with
> > which you can modify the exact update behavior). By default though
> > the script will assume that it is loading a new ontology.
> >
> > 	-hilmar
> >
> > On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote:
> >> Hi Hilmar,
> >>
> >> thanks for the help. I've tried now this
> >>
> >> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass
> >> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo
> >>
> >> downloaded from here
> >>
> >> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.ob
> >>o
> >>
> >> and I have this error message.
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: DBLink 	 _default
> >> ---------------------------------------------------
> >> Could not store term GO:0001404, name 'invasive growth':
> >>
> >> ------------- EXCEPTION: Bio::Root::Exception -------------
> >> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to
> >> be found by unique key
> >> STACK: Error::throw
> >> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/
> >> Root.pm:357
> >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/
> >> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
> >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
> >> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
> >> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
> >> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
> >> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
> >> load_ontology.pl:812
> >> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
> >> -----------------------------------------------------------
> >>
> >> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
> >>       main::persist_term('-term',
> >> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db',
> >> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory',
> >> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at /
> >> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
> >>
> >> Any hints to know where the problem would be?
> >>
> >> Thanks in advance,
> >>
> >> Carlos
> >>
> >> Carlos  Canchaya
> >> ccanchaya at gmail.com
> >>
> >> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote:
> >>> Please leave off the --fmtargs GO.defs argument - this is not a
> >>> file in the .obo format.
> >>>
> >>> 	-hilmar
> >>>
> >>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote:
> >>>> Hi guys,
> >>>>
> >>>> I've tried to execute load_ontologies following your suggestions as
> >>>>
> >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy --
> >>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format
> >>>> obo gene_ontology.1_2.obo
> >>>>
> >>>> However I have many warnings first
> >>>>
> >>>> --------------------- WARNING ---------------------
> >>>> MSG: DBLink exists in the dblink of _default
> >>>> ---------------------------------------------------
> >>>>
> >>>> and then
> >>>>
> >>>> --------------------- WARNING ---------------------
> >>>> MSG: DBLink exists in the dblink of _default
> >>>> ---------------------------------------------------
> >>>> Could not store term GO:0001404, name 'invasive growth':
> >>>>
> >>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or
> >>>> to be found by unique key
> >>>> STACK: Error::throw
> >>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/
> >>>> bioperl-live//Bio/Root/Root.pm:357
> >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
> >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
> >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
> >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
> >>>> load_ontology.pl:812
> >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
> >>>> -----------------------------------------------------------
> >>>>
> >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
> >>>>     main::persist_term('-term',
> >>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db',
> >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory',
> >>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at /
> >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
> >>>>
> >>>>
> >>>> Any ideas why?
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> Carlos
> >>>>
> >>>>
> >>>> Carlos  Canchaya
> >>>> ccanchaya at gmail.com

From hlapp at gmx.net  Thu Aug  6 09:46:06 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Aug 2009 09:46:06 -0400
Subject: [BioSQL-l] Error when loading Gene Ontology to biosql
In-Reply-To: <200908061143.56479.florian.mittag@uni-tuebingen.de>
References: <fc0bfd871e72.4a65b072@mpiz-koeln.mpg.de>
	<1E596269-ED8F-4ADF-9B54-A9A0CF908620@gmx.net>
	<52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu>
	<200908061143.56479.florian.mittag@uni-tuebingen.de>
Message-ID: <BF35C26A-53FC-4057-A71D-771FEF50AE3B@gmx.net>

The warnings are fine. They simply indicates that a dbxref is being  
added to the term that it already had.

Part of the reason for that happening may be that Bioperl-db doesn't  
support different kinds of dbxrefs for terms yet, if I recall  
correctly, so once retrieved from the database they all end up in the  
_default category.

	-hilmar

On Aug 6, 2009, at 5:43 AM, Florian Mittag wrote:

> Hi!
>
> On Friday, 24. July 2009 02:39, Chris Fields wrote:
>> The warning is interesting, as it derives from our rollback of  
>> feature/
>> annotation stuff in bioperl.  It indicates the specified DBLink is
>> duplicated in the Bio::Ontology::Term.
>>
>> The exception makes sense in light of that (and seems to confirm the
>> link was already present).
>
> I'm getting the same warnings with my custom DB2 driver and with  
> MySQL, but
> the script completes successfully. I get them when loading the Gene  
> Ontology
> and the Sequence Ontology.
>
> -------------------- WARNING ---------------------
> MSG: GOC:mah exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: PMID:12297042 exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: GOC:mah exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: GOC:rph exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: PMID:12930826 exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: PMID:15012271 exists in the dblink of _default
> ---------------------------------------------------
>
> [...]
> Done with sequence.
> Done, cleaning up.
>
>
> What to do?
>
> - Florian
>
>>
>> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote:
>>> Hi Carlos - that's an odd error that we haven't seen yet. My first
>>> impulse would be to suspect that your database wasn't empty when you
>>> ran this, and that the error you got is due to a term in the input
>>> file clashing with one you already have in the database.
>>>
>>> You can check this by looking into your database:
>>>
>>> SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name =
>>> 'invasive growth';
>>>
>>> Does this return anything?
>>>
>>> Note that load_ontology.pl is perfectly equipped to update an
>>> existing ontology - check the POD and look for the --lookup command
>>> line option (and the several options following it in the POD with
>>> which you can modify the exact update behavior). By default though
>>> the script will assume that it is loading a new ontology.
>>>
>>> 	-hilmar
>>>
>>> On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote:
>>>> Hi Hilmar,
>>>>
>>>> thanks for the help. I've tried now this
>>>>
>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass
>>>> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo
>>>>
>>>> downloaded from here
>>>>
>>>> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.ob
>>>> o
>>>>
>>>> and I have this error message.
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: DBLink 	 _default
>>>> ---------------------------------------------------
>>>> Could not store term GO:0001404, name 'invasive growth':
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to
>>>> be found by unique key
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/ 
>>>> Root/
>>>> Root.pm:357
>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/
>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
>>>> load_ontology.pl:812
>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
>>>> -----------------------------------------------------------
>>>>
>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
>>>>      main::persist_term('-term',
>>>> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db',
>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory',
>>>> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at /
>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
>>>>
>>>> Any hints to know where the problem would be?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Carlos
>>>>
>>>> Carlos  Canchaya
>>>> ccanchaya at gmail.com
>>>>
>>>> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote:
>>>>> Please leave off the --fmtargs GO.defs argument - this is not a
>>>>> file in the .obo format.
>>>>>
>>>>> 	-hilmar
>>>>>
>>>>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote:
>>>>>> Hi guys,
>>>>>>
>>>>>> I've tried to execute load_ontologies following your  
>>>>>> suggestions as
>>>>>>
>>>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy --
>>>>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format
>>>>>> obo gene_ontology.1_2.obo
>>>>>>
>>>>>> However I have many warnings first
>>>>>>
>>>>>> --------------------- WARNING ---------------------
>>>>>> MSG: DBLink exists in the dblink of _default
>>>>>> ---------------------------------------------------
>>>>>>
>>>>>> and then
>>>>>>
>>>>>> --------------------- WARNING ---------------------
>>>>>> MSG: DBLink exists in the dblink of _default
>>>>>> ---------------------------------------------------
>>>>>> Could not store term GO:0001404, name 'invasive growth':
>>>>>>
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or
>>>>>> to be found by unique key
>>>>>> STACK: Error::throw
>>>>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/
>>>>>> bioperl-live//Bio/Root/Root.pm:357
>>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/ 
>>>>>> local/
>>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
>>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
>>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
>>>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
>>>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
>>>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
>>>>>> load_ontology.pl:812
>>>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
>>>>>>    main::persist_term('-term',
>>>>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db',
>>>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory',
>>>>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at /
>>>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
>>>>>>
>>>>>>
>>>>>> Any ideas why?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Carlos
>>>>>>
>>>>>>
>>>>>> Carlos  Canchaya
>>>>>> ccanchaya at gmail.com

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From florian.mittag at uni-tuebingen.de  Thu Aug  6 10:20:31 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 6 Aug 2009 16:20:31 +0200
Subject: [BioSQL-l] Error when loading Gene Ontology to biosql
In-Reply-To: <BF35C26A-53FC-4057-A71D-771FEF50AE3B@gmx.net>
References: <fc0bfd871e72.4a65b072@mpiz-koeln.mpg.de>
	<200908061143.56479.florian.mittag@uni-tuebingen.de>
	<BF35C26A-53FC-4057-A71D-771FEF50AE3B@gmx.net>
Message-ID: <200908061620.31766.florian.mittag@uni-tuebingen.de>

Ok, that's a relieve. Thanks for the quick answer!

- Florian

On Thursday, 6. August 2009 15:46, Hilmar Lapp wrote:
> The warnings are fine. They simply indicates that a dbxref is being
> added to the term that it already had.
>
> Part of the reason for that happening may be that Bioperl-db doesn't
> support different kinds of dbxrefs for terms yet, if I recall
> correctly, so once retrieved from the database they all end up in the
> _default category.
>
> 	-hilmar
>
> On Aug 6, 2009, at 5:43 AM, Florian Mittag wrote:
> > Hi!
> >
> > On Friday, 24. July 2009 02:39, Chris Fields wrote:
> >> The warning is interesting, as it derives from our rollback of
> >> feature/
> >> annotation stuff in bioperl.  It indicates the specified DBLink is
> >> duplicated in the Bio::Ontology::Term.
> >>
> >> The exception makes sense in light of that (and seems to confirm the
> >> link was already present).
> >
> > I'm getting the same warnings with my custom DB2 driver and with
> > MySQL, but
> > the script completes successfully. I get them when loading the Gene
> > Ontology
> > and the Sequence Ontology.
> >
> > -------------------- WARNING ---------------------
> > MSG: GOC:mah exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: PMID:12297042 exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: GOC:mah exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: GOC:rph exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: PMID:12930826 exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: PMID:15012271 exists in the dblink of _default
> > ---------------------------------------------------
> >
> > [...]
> > Done with sequence.
> > Done, cleaning up.
> >
> >
> > What to do?
> >
> > - Florian
> >
> >> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote:
> >>> Hi Carlos - that's an odd error that we haven't seen yet. My first
> >>> impulse would be to suspect that your database wasn't empty when you
> >>> ran this, and that the error you got is due to a term in the input
> >>> file clashing with one you already have in the database.
> >>>
> >>> You can check this by looking into your database:
> >>>
> >>> SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name =
> >>> 'invasive growth';
> >>>
> >>> Does this return anything?
> >>>
> >>> Note that load_ontology.pl is perfectly equipped to update an
> >>> existing ontology - check the POD and look for the --lookup command
> >>> line option (and the several options following it in the POD with
> >>> which you can modify the exact update behavior). By default though
> >>> the script will assume that it is loading a new ontology.
> >>>
> >>> 	-hilmar
> >>>
> >>> On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote:
> >>>> Hi Hilmar,
> >>>>
> >>>> thanks for the help. I've tried now this
> >>>>
> >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass
> >>>> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo
> >>>>
> >>>> downloaded from here
> >>>>
> >>>> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.
> >>>>ob o
> >>>>
> >>>> and I have this error message.
> >>>>
> >>>> --------------------- WARNING ---------------------
> >>>> MSG: DBLink 	 _default
> >>>> ---------------------------------------------------
> >>>> Could not store term GO:0001404, name 'invasive growth':
> >>>>
> >>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to
> >>>> be found by unique key
> >>>> STACK: Error::throw
> >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/
> >>>> Root/
> >>>> Root.pm:357
> >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
> >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
> >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
> >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
> >>>> load_ontology.pl:812
> >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
> >>>> -----------------------------------------------------------
> >>>>
> >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
> >>>>      main::persist_term('-term',
> >>>> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db',
> >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory',
> >>>> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at /
> >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
> >>>>
> >>>> Any hints to know where the problem would be?
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> Carlos
> >>>>
> >>>> Carlos  Canchaya
> >>>> ccanchaya at gmail.com
> >>>>
> >>>> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote:
> >>>>> Please leave off the --fmtargs GO.defs argument - this is not a
> >>>>> file in the .obo format.
> >>>>>
> >>>>> 	-hilmar
> >>>>>
> >>>>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote:
> >>>>>> Hi guys,
> >>>>>>
> >>>>>> I've tried to execute load_ontologies following your
> >>>>>> suggestions as
> >>>>>>
> >>>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy --
> >>>>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format
> >>>>>> obo gene_ontology.1_2.obo
> >>>>>>
> >>>>>> However I have many warnings first
> >>>>>>
> >>>>>> --------------------- WARNING ---------------------
> >>>>>> MSG: DBLink exists in the dblink of _default
> >>>>>> ---------------------------------------------------
> >>>>>>
> >>>>>> and then
> >>>>>>
> >>>>>> --------------------- WARNING ---------------------
> >>>>>> MSG: DBLink exists in the dblink of _default
> >>>>>> ---------------------------------------------------
> >>>>>> Could not store term GO:0001404, name 'invasive growth':
> >>>>>>
> >>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or
> >>>>>> to be found by unique key
> >>>>>> STACK: Error::throw
> >>>>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/
> >>>>>> bioperl-live//Bio/Root/Root.pm:357
> >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/
> >>>>>> local/
> >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
> >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
> >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
> >>>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
> >>>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
> >>>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
> >>>>>> load_ontology.pl:812
> >>>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
> >>>>>> -----------------------------------------------------------
> >>>>>>
> >>>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
> >>>>>>    main::persist_term('-term',
> >>>>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db',
> >>>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory',
> >>>>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at /
> >>>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
> >>>>>>
> >>>>>>
> >>>>>> Any ideas why?
> >>>>>>
> >>>>>> Thanks in advance,
> >>>>>>
> >>>>>> Carlos
> >>>>>>
> >>>>>>
> >>>>>> Carlos  Canchaya
> >>>>>> ccanchaya at gmail.com

-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From haili at mpiz-koeln.mpg.de  Mon Aug 10 10:21:39 2009
From: haili at mpiz-koeln.mpg.de (Song Haili)
Date: Mon, 10 Aug 2009 16:21:39 +0200
Subject: [BioSQL-l] how to load other data to biosql database?
Message-ID: <fc26e9403015.4a804913@mpiz-koeln.mpg.de>

Dear all,

Does any of you know how to load other data, such as domain, EC number, Mapman bins, Interaction , Kegg Ontology etc, into biosql database? Is it possible by using load_ontology.pl? If it is, what are the corresponding arguments? Otherwise, should I write my own scripts? Any suggestion will be highly appreciated!

Best regards,

song


From florian.mittag at uni-tuebingen.de  Tue Aug 11 04:10:12 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Tue, 11 Aug 2009 10:10:12 +0200
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to?
Message-ID: <200908111010.12143.florian.mittag@uni-tuebingen.de>

Hi!

I stumbled upon an old post from Hilmar:

On Tue, 18 Mar 2003, Hilmar Lapp wrote:
> type_term_id is supposed to reference an SO term. source is supposed to
> denote the 'method'  (BLAST, BLAT, sim4, genewise, whatnot), as far as
> my understanding goes. In the case of reading the features from a
> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the source
> (which is what the genbank, embl, and swissprot parsers do in bioperl)
> is maybe stretching the definition, but I don't have something
> substantially better to offer.

I inspected the database after I imported some Genbank files with BioJava, and 
I found that the source_term_id for the seqfeatures is always set to the ID 
of an automatically inserted term "Genbank" with definition "auto-generated 
by biojavax".

I was wondering if there is anything new to the source_term_id.

- Florian

From florian.mittag at uni-tuebingen.de  Tue Aug 11 05:09:50 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Tue, 11 Aug 2009 11:09:50 +0200
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
Message-ID: <200908111109.50361.florian.mittag@uni-tuebingen.de>

Hm, I should've mentioned my real concern. We're integrating all kinds of data 
into the database and right now I want to import miRNA information (sequences 
and target sites) from miRBase (http://microrna.sanger.ac.uk/sequences/).
The files I download from there specify "miRanda" as METHOD, so should I use 
this as source term or miRBase?

Thanks,
- Florian

On Tuesday, 11. August 2009 10:59, Richard Holland wrote:
> The reason BJX does that is because the Genbank format has no
> indication of where a feature came from. So, all there is to go on is
> that it came from Genbank! This allows us to differentiate between
> features on a sequence that were loaded from an original file, and new
> features that have been added to the sequence in the db after it was
> loaded (e.g. by running blast, blat etc. against some local data).
>
> On 11 Aug 2009, at 09:10, Florian Mittag wrote:
> > Hi!
> >
> > I stumbled upon an old post from Hilmar:
> >
> > On Tue, 18 Mar 2003, Hilmar Lapp wrote:
> >> type_term_id is supposed to reference an SO term. source is
> >> supposed to
> >> denote the 'method'  (BLAST, BLAT, sim4, genewise, whatnot), as far
> >> as
> >> my understanding goes. In the case of reading the features from a
> >> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the
> >> source
> >> (which is what the genbank, embl, and swissprot parsers do in
> >> bioperl)
> >> is maybe stretching the definition, but I don't have something
> >> substantially better to offer.
> >
> > I inspected the database after I imported some Genbank files with
> > BioJava, and
> > I found that the source_term_id for the seqfeatures is always set to
> > the ID
> > of an automatically inserted term "Genbank" with definition "auto-
> > generated
> > by biojavax".
> >
> > I was wondering if there is anything new to the source_term_id.
> >
> > - Florian
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/

-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From holland at eaglegenomics.com  Tue Aug 11 05:22:41 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 11 Aug 2009 10:22:41 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <200908111109.50361.florian.mittag@uni-tuebingen.de>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
Message-ID: <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>

Ideally there would be two fields for source_term_id - one for the  
algorithm used to generate the data (e.g. BLAST, miRanda), the other  
for the source the data came from (e.g. Genbank, miRBase). These are  
two very distinct concepts and it is not easy to represent them  
successfully using a single ontology source_term_id field. So the only  
way round it if you need to represent both algorithm and source is to  
create your own ontology which is a cross-product of the two possible  
sets of values (triples would be good for this).

If you want to use only a single term, basically it's up to you  
whether you choose to annotate by algorithm (miRanda) or by source  
(miRBase). I expect the decision will rest on whether it is more  
important for you to know which features in your database were added  
locally and which came from a remote source, or if knowing the  
algorithm used to generate them is more important. Otherwise if both  
are important the cross-product triple approach is probably the only  
way to go.

cheers,
Richard

On 11 Aug 2009, at 10:09, Florian Mittag wrote:

> Hm, I should've mentioned my real concern. We're integrating all  
> kinds of data
> into the database and right now I want to import miRNA information  
> (sequences
> and target sites) from miRBase (http://microrna.sanger.ac.uk/sequences/ 
> ).
> The files I download from there specify "miRanda" as METHOD, so  
> should I use
> this as source term or miRBase?
>
> Thanks,
> - Florian
>
> On Tuesday, 11. August 2009 10:59, Richard Holland wrote:
>> The reason BJX does that is because the Genbank format has no
>> indication of where a feature came from. So, all there is to go on is
>> that it came from Genbank! This allows us to differentiate between
>> features on a sequence that were loaded from an original file, and  
>> new
>> features that have been added to the sequence in the db after it was
>> loaded (e.g. by running blast, blat etc. against some local data).
>>
>> On 11 Aug 2009, at 09:10, Florian Mittag wrote:
>>> Hi!
>>>
>>> I stumbled upon an old post from Hilmar:
>>>
>>> On Tue, 18 Mar 2003, Hilmar Lapp wrote:
>>>> type_term_id is supposed to reference an SO term. source is
>>>> supposed to
>>>> denote the 'method'  (BLAST, BLAT, sim4, genewise, whatnot), as far
>>>> as
>>>> my understanding goes. In the case of reading the features from a
>>>> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the
>>>> source
>>>> (which is what the genbank, embl, and swissprot parsers do in
>>>> bioperl)
>>>> is maybe stretching the definition, but I don't have something
>>>> substantially better to offer.
>>>
>>> I inspected the database after I imported some Genbank files with
>>> BioJava, and
>>> I found that the source_term_id for the seqfeatures is always set to
>>> the ID
>>> of an automatically inserted term "Genbank" with definition "auto-
>>> generated
>>> by biojavax".
>>>
>>> I was wondering if there is anything new to the source_term_id.
>>>
>>> - Florian
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>
> -- 
> Dipl. Inf. Florian Mittag
> Universit?t Tuebingen
> WSI-RA, Sand 1
> 72076 Tuebingen, Germany
> Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Tue Aug 11 04:59:27 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 11 Aug 2009 09:59:27 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <200908111010.12143.florian.mittag@uni-tuebingen.de>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
Message-ID: <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>

The reason BJX does that is because the Genbank format has no  
indication of where a feature came from. So, all there is to go on is  
that it came from Genbank! This allows us to differentiate between  
features on a sequence that were loaded from an original file, and new  
features that have been added to the sequence in the db after it was  
loaded (e.g. by running blast, blat etc. against some local data).

On 11 Aug 2009, at 09:10, Florian Mittag wrote:

> Hi!
>
> I stumbled upon an old post from Hilmar:
>
> On Tue, 18 Mar 2003, Hilmar Lapp wrote:
>> type_term_id is supposed to reference an SO term. source is  
>> supposed to
>> denote the 'method'  (BLAST, BLAT, sim4, genewise, whatnot), as far  
>> as
>> my understanding goes. In the case of reading the features from a
>> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the  
>> source
>> (which is what the genbank, embl, and swissprot parsers do in  
>> bioperl)
>> is maybe stretching the definition, but I don't have something
>> substantially better to offer.
>
> I inspected the database after I imported some Genbank files with  
> BioJava, and
> I found that the source_term_id for the seqfeatures is always set to  
> the ID
> of an automatically inserted term "Genbank" with definition "auto- 
> generated
> by biojavax".
>
> I was wondering if there is anything new to the source_term_id.
>
> - Florian
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From hlapp at gmx.net  Fri Aug 14 18:56:11 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Aug 2009 18:56:11 -0400
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
Message-ID: <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>


On Aug 11, 2009, at 5:22 AM, Richard Holland wrote:

> Ideally there would be two fields for source_term_id - one for the  
> algorithm used to generate the data (e.g. BLAST, miRanda), the other  
> for the source the data came from (e.g. Genbank, miRBase).


You mean the source of the data that it was applied to.

I agree though that if you want both you can create a cross-product  
term and store the decomposition as term_relationship's.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at eaglegenomics.com  Sat Aug 15 06:44:16 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Sat, 15 Aug 2009 11:44:16 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
Message-ID: <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>


On 14 Aug 2009, at 23:56, Hilmar Lapp wrote:

>
> On Aug 11, 2009, at 5:22 AM, Richard Holland wrote:
>
>> Ideally there would be two fields for source_term_id - one for the  
>> algorithm used to generate the data (e.g. BLAST, miRanda), the  
>> other for the source the data came from (e.g. Genbank, miRBase).
>
>
> You mean the source of the data that it was applied to.

Not necessarily. The source of the data that it was applied to (ie.  
the sequence the feature refers to) is a third thing - and that is an  
attribute of the sequence the feature refers to, rather than the  
feature itself.

What I mean is this:

   1. The sequence itself could be downloaded from Genbank, EMBL, or  
elsewhere, or I could have discovered it in-house.
   2. The features on the sequence could have been generated by  
running BLAST, miRBase, etc., or they could be manually annotated.
   3. The features on the sequence could have been downloaded from  
Genbank, EMBL, etc., or they could have been made locally, or by a  
collaborator at another institute.

To my mind these are three distinct things. (1) is sequence-related,  
and (2) and (3) are feature-related.

cheers,
Richard

> I agree though that if you want both you can create a cross-product  
> term and store the decomposition as term_relationship's.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From hlapp at gmx.net  Sat Aug 15 10:29:24 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 15 Aug 2009 10:29:24 -0400
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
Message-ID: <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>


On Aug 15, 2009, at 6:44 AM, Richard Holland wrote:

> [...]
> What I mean is this:
>
>  1. The sequence itself could be downloaded from Genbank, EMBL, or  
> elsewhere, or I could have discovered it in-house.

That's actually what I meant.

>  2. The features on the sequence could have been generated by  
> running BLAST, miRBase, etc., or they could be manually annotated.
>  3. The features on the sequence could have been downloaded from  
> Genbank, EMBL, etc., or they could have been made locally, or by a  
> collaborator at another institute.

Right, but if a feature is the result of you running some algorithm  
against some sequences, then it's not been downloaded or given to you.  
Features on one and the same sequence can have different sources,  
obviously, so I'm a bit confused - I think we're talking about the  
same thing in different words, but I'm not sure.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at eaglegenomics.com  Sat Aug 15 12:32:35 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Sat, 15 Aug 2009 17:32:35 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
	<30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
Message-ID: <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>


On 15 Aug 2009, at 15:29, Hilmar Lapp wrote:

>
> On Aug 15, 2009, at 6:44 AM, Richard Holland wrote:
>
>> [...]
>> What I mean is this:
>>
>> 1. The sequence itself could be downloaded from Genbank, EMBL, or  
>> elsewhere, or I could have discovered it in-house.
>
> That's actually what I meant.
>
>> 2. The features on the sequence could have been generated by  
>> running BLAST, miRBase, etc., or they could be manually annotated.
>> 3. The features on the sequence could have been downloaded from  
>> Genbank, EMBL, etc., or they could have been made locally, or by a  
>> collaborator at another institute.
>
> Right, but if a feature is the result of you running some algorithm  
> against some sequences, then it's not been downloaded or given to  
> you. Features on one and the same sequence can have different  
> sources, obviously, so I'm a bit confused - I think we're talking  
> about the same thing in different words, but I'm not sure.

Probably. :)

Case study: I download some seqs from Genbank. (Which then need to be  
annotated as having come from Genbank, at the sequence level). They  
already have some features on them (which need to be annotated as  
having come from Genbank, at the feature level, but of an unknown  
algorithm as Genbank doesn't specify how they were generated usually).  
I then run BLAST of those sequences against some local data, and  
record my own features as a result. I also run BLAT, and again record  
my own features. My colleague also runs BLAST of the same seqs against  
some data of his own, and wants our combined feature results to be  
stored in the same database. I want to be able to annotate all these  
new features both with the algorithm used to generate them (BLAST or  
BLAT) and who did it (myself or my colleague at the institute down the  
road), in addition to retaining the original features that came from  
Genbank (and making sure they're annotated as such). Hence I'd need a  
source attribute for the sequence (Genbank in this case), a source  
attribute for each feature (Genbank, Me, or Colleague X, in this  
case), and an algorithm/technique/protocol attribute for each feature  
(BLAST or BLAT or 'don't know it just came from Genbank' in this  
example).

cheers,
Richard

> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From hlapp at gmx.net  Sat Aug 15 15:31:13 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 15 Aug 2009 15:31:13 -0400
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
	<30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
	<1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>
Message-ID: <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net>


On Aug 15, 2009, at 12:32 PM, Richard Holland wrote:

> [...]
> Case study:

Great, now we're getting somewhere :-)

> I download some seqs from Genbank. (Which then need to be annotated  
> as having come from Genbank, at the sequence level).

Note, as you say, *at the sequence level*. I.e., you would record this  
either using the bioentry's namespace (biodatabase), or a  
bioentry_qualifier_value annotation. I would choose the former, though  
since a bioentry can on only be in one namespace, it may not satisfy  
your needs.

> They already have some features on them (which need to be annotated  
> as having come from Genbank, at the feature level, but of an unknown  
> algorithm as Genbank doesn't specify how they were generated usually).

Right. The source term would indicate that GenBank provided them to  
you, and that that's all you know.

> I then run BLAST of those sequences against some local data, and  
> record my own features as a result. I also run BLAT, and again  
> record my own features.

BLAST and BLAT would now be the source terms.

> My colleague also runs BLAST of the same seqs against some data of  
> his own, and wants our combined feature results to be stored in the  
> same database. I want to be able to annotate all these new features  
> both with the algorithm used to generate them (BLAST or BLAT)

You use the source term for that.

> and who did it (myself or my colleague at the institute down the road)

Ah - that's provenance information, not the source as is normally  
referred to. BioSQL at present doesn't have an explicit provenance  
model, but you can still record provenance information through  
ontology-typed tag/value annotation in seqfeature_qualifier_value,  
with the terms coming from a provenance ontology (that you make up  
yourself or grab from somewhere else).

> , in addition to retaining the original features that came from  
> Genbank (and making sure they're annotated as such).

That shouldn't be a problem - certainly it's not for BioSQL.

> Hence I'd need a source attribute for the sequence (Genbank in this  
> case), a source attribute for each feature (Genbank, Me, or  
> Colleague X, in this case), and an algorithm/technique/protocol  
> attribute for each feature (BLAST or BLAT or 'don't know it just  
> came from Genbank' in this example).

Not quite - source really is what provided the feature to you, not who  
or when, or using which BLAST database, genome assembly, or how you  
parsed the results, etc etc. That's all provenance information.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at eaglegenomics.com  Sat Aug 15 16:00:39 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Sat, 15 Aug 2009 21:00:39 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
	<30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
	<1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>
	<82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net>
Message-ID: <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com>

Ok, cool. So we can now rephrase the original question to...: How  
should provenance information be stored in BioSQL?

:)

cheers,
Richard

On 15 Aug 2009, at 20:31, Hilmar Lapp wrote:

>
> On Aug 15, 2009, at 12:32 PM, Richard Holland wrote:
>
>> [...]
>> Case study:
>
> Great, now we're getting somewhere :-)
>
>> I download some seqs from Genbank. (Which then need to be annotated  
>> as having come from Genbank, at the sequence level).
>
> Note, as you say, *at the sequence level*. I.e., you would record  
> this either using the bioentry's namespace (biodatabase), or a  
> bioentry_qualifier_value annotation. I would choose the former,  
> though since a bioentry can on only be in one namespace, it may not  
> satisfy your needs.
>
>> They already have some features on them (which need to be annotated  
>> as having come from Genbank, at the feature level, but of an  
>> unknown algorithm as Genbank doesn't specify how they were  
>> generated usually).
>
> Right. The source term would indicate that GenBank provided them to  
> you, and that that's all you know.
>
>> I then run BLAST of those sequences against some local data, and  
>> record my own features as a result. I also run BLAT, and again  
>> record my own features.
>
> BLAST and BLAT would now be the source terms.
>
>> My colleague also runs BLAST of the same seqs against some data of  
>> his own, and wants our combined feature results to be stored in the  
>> same database. I want to be able to annotate all these new features  
>> both with the algorithm used to generate them (BLAST or BLAT)
>
> You use the source term for that.
>
>> and who did it (myself or my colleague at the institute down the  
>> road)
>
> Ah - that's provenance information, not the source as is normally  
> referred to. BioSQL at present doesn't have an explicit provenance  
> model, but you can still record provenance information through  
> ontology-typed tag/value annotation in seqfeature_qualifier_value,  
> with the terms coming from a provenance ontology (that you make up  
> yourself or grab from somewhere else).
>
>> , in addition to retaining the original features that came from  
>> Genbank (and making sure they're annotated as such).
>
> That shouldn't be a problem - certainly it's not for BioSQL.
>
>> Hence I'd need a source attribute for the sequence (Genbank in this  
>> case), a source attribute for each feature (Genbank, Me, or  
>> Colleague X, in this case), and an algorithm/technique/protocol  
>> attribute for each feature (BLAST or BLAT or 'don't know it just  
>> came from Genbank' in this example).
>
> Not quite - source really is what provided the feature to you, not  
> who or when, or using which BLAST database, genome assembly, or how  
> you parsed the results, etc etc. That's all provenance information.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From hlapp at gmx.net  Sat Aug 15 16:14:54 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 15 Aug 2009 16:14:54 -0400
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
	<30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
	<1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>
	<82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net>
	<5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com>
Message-ID: <92DD5E74-5638-4CB8-B34A-3282AACF036A@gmx.net>


On Aug 15, 2009, at 4:00 PM, Richard Holland wrote:

> Ok, cool. So we can now rephrase the original question to...: How  
> should provenance information be stored in BioSQL?


Yes, and the answer is using a provenance ontology or controlled  
vocabulary and bioentry_qualifier_value and seqfeature_qualifier_value.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Wed Aug 26 06:53:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 26 Aug 2009 11:53:40 +0100
Subject: [BioSQL-l] Indexing of (seqfeature) locations?
Message-ID: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>

Hi BioSQL folks,

The BioSQL schema includes a few indexes on the location table
(e.g. quoting the MySQL schema, but it looks the same on pg too):

CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
CREATE INDEX seqfeatureloc_dbx   ON location(dbxref_id);
CREATE INDEX seqfeatureloc_trm   ON location(term_id);

Will these facilitate searches like this?:

"SELECT ... WHERE 2000 <= location.start_pos
AND location.end_pos <= 5000 AND ..."

Or, for this would it help to include:

CREATE INDEX seqfeatureloc_start ON location(start_pos);
CREATE INDEX seqfeatureloc_start ON location(end_pos);

A motivational use case would be to pull out an operon, or a
region of a record as part of a genome browser.

Thanks,

Peter

From hlapp at gmx.net  Wed Aug 26 08:07:08 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 26 Aug 2009 08:07:08 -0400
Subject: [BioSQL-l] Indexing of (seqfeature) locations?
In-Reply-To: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>
References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>
Message-ID: <B7C53E8C-34DC-44D5-9322-8A8C690F202A@gmx.net>


On Aug 26, 2009, at 6:53 AM, Peter wrote:

> The BioSQL schema includes a few indexes on the location table
> (e.g. quoting the MySQL schema, but it looks the same on pg too):
>
> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
> [...]
> Will these facilitate searches like this?:
>
> "SELECT ... WHERE 2000 <= location.start_pos
> AND location.end_pos <= 5000 AND ..."
>
> Or, for this would it help to include:
>
> CREATE INDEX seqfeatureloc_start ON location(start_pos);
> CREATE INDEX seqfeatureloc_start ON location(end_pos);

With a decent RDBMS, having two indexes instead of a compound one will  
slow this query down. What the compound one won't help you with is if  
your query doesn't constrain the leading columns. For example, a  
compound index on (start_pos,end_pos) won't be used if you only  
constrain end_pos. If you want to do that, you need on index on  
(end_pos) too.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Wed Aug 26 08:29:56 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 26 Aug 2009 13:29:56 +0100
Subject: [BioSQL-l] Indexing of (seqfeature) locations?
In-Reply-To: <B7C53E8C-34DC-44D5-9322-8A8C690F202A@gmx.net>
References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>
	<B7C53E8C-34DC-44D5-9322-8A8C690F202A@gmx.net>
Message-ID: <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com>

On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp<hlapp at gmx.net> wrote:
>
>
> On Aug 26, 2009, at 6:53 AM, Peter wrote:
>
>> The BioSQL schema includes a few indexes on the location table
>> (e.g. quoting the MySQL schema, but it looks the same on pg too):
>>
>> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
>> [...]
>> Will these facilitate searches like this?:
>>
>> "SELECT ... WHERE 2000 <= location.start_pos
>> AND location.end_pos <= 5000 AND ..."
>>
>> Or, for this would it help to include:
>>
>> CREATE INDEX seqfeatureloc_start ON location(start_pos);
>> CREATE INDEX seqfeatureloc_start ON location(end_pos);
>
> With a decent RDBMS, having two indexes instead of a compound one will slow
> this query down. What the compound one won't help you with is if your query
> doesn't constrain the leading columns. For example, a compound index on
> (start_pos,end_pos) won't be used if you only constrain end_pos. If you want
> to do that, you need on index on (end_pos) too.

Thanks for your reply Hilmar. Just to make sure I understood, the current
BioSQL indexes are fine for this:

 "SELECT ... WHERE 2000 <= location.start_pos
AND location.end_pos <= 5000 AND ..."

but not so great for:

 "SELECT ... WHERE 2000 <= location.start_pos AND ..."

or,

 "SELECT ... WHERE location.end_pos <= 5000 AND ..."

Nevertheless, that should cover most usage.

Having just two separated indexes on start_pos and end_pos would
speed up queries on just start or end, but would slow down queries
using both.

Presumably having three indexes as follows would cover all these
examples efficiently, but at the cost of two more indexes?:

CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
CREATE INDEX seqfeatureloc_start ON location(start_pos);
CREATE INDEX seqfeatureloc_start ON location(end_pos);

If that is all accurate, the status quo is fine :)

Regards,

Peter

From haili at mpiz-koeln.mpg.de  Wed Aug 26 10:18:09 2009
From: haili at mpiz-koeln.mpg.de (Song Haili)
Date: Wed, 26 Aug 2009 16:18:09 +0200
Subject: [BioSQL-l] error with load_ontology
Message-ID: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>

Hi All,
I encountered an error message when using load_ontology.pl to load gene ontology into biosql database. The command used is: 

perl load_ontology.pl --driver Pg --host pg-server --dbname dbname --dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --format obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete. 

At the beginning,? data can be loaded with warnings, but late an exception occurred and the loading was terminated. Waring and error messages? shown below:

?--------------------- WARNING ---------------------MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS RELATED EC:2.4.1.212) (FK 20447 to Bio::Ontology::OBOterm):ERROR:? current transaction is aborted, commands ignored until end of transaction block---------------------------------------------------Could not store term GO:0050501, name 'hyaluronan synthase activity':------------- EXCEPTION -------------MSG: error while executing statement in Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:? current transaction is aborted, commands ignored until end of transaction blockSTACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK (eval) load_ontology.pl:812STACK main::persist_term load_ontology.pl:794STACK toplevel load_ontology.pl:617-------------------------------------
Can you please help me to solve this problem out? Thank you very much.
Best regards,
song


From hlapp at gmx.net  Wed Aug 26 11:50:35 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 26 Aug 2009 11:50:35 -0400
Subject: [BioSQL-l] error with load_ontology
In-Reply-To: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
References: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
Message-ID: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>

Song,

there should have been an error or warning that immediately preceded  
this errors. It is that one that's the root cause.

Also, are you using by any chance the BioSQL version for PostgreSQL  
that has the RULEs removed? If yes, then at this point you cannot use  
any Bioperl-db scripts (or code) with it, unless you install the rules  
before you run such a script (and presumably remove them again  
afterwards).

	-hilmar

On Aug 26, 2009, at 10:18 AM, Song Haili wrote:

> Hi All,
> I encountered an error message when using load_ontology.pl to load  
> gene ontology into biosql database. The command used is:
>
> perl load_ontology.pl --driver Pg --host pg-server --dbname dbname -- 
> dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --format  
> obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete.
>
> At the beginning,  data can be loaded with warnings, but late an  
> exception occurred and the loading was terminated. Waring and error  
> messages  shown below:
>
>  --------------------- WARNING ---------------------MSG: failed to  
> store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS  
> RELATED EC:2.4.1.212) (FK 20447 to Bio::Ontology::OBOterm):ERROR:   
> current transaction is aborted, commands ignored until end of  
> transaction block--------------------------------------------------- 
> Could not store term GO:0050501, name 'hyaluronan synthase  
> activity':------------- EXCEPTION -------------MSG: error while  
> executing statement in  
> Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:  current  
> transaction is aborted, commands ignored until end of transaction  
> blockSTACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/ 
> lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: 
> 970STACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/ 
> lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: 
> 873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/ 
> site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK  
> Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/ 
> 5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/ 
> 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/ 
> 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK  
> Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/ 
> 5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK (eval)  
> load_ontology.pl:812STACK main::persist_term load_ontology.pl: 
> 794STACK toplevel load_ontology.pl: 
> 617-------------------------------------
> Can you please help me to solve this problem out? Thank you very much.
> Best regards,
> song
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Aug 26 11:56:25 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 26 Aug 2009 11:56:25 -0400
Subject: [BioSQL-l] Indexing of (seqfeature) locations?
In-Reply-To: <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com>
References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>
	<B7C53E8C-34DC-44D5-9322-8A8C690F202A@gmx.net>
	<320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com>
Message-ID: <48B04E04-8561-45EB-9C64-8011665A74A2@gmx.net>


On Aug 26, 2009, at 8:29 AM, Peter wrote:

> On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp<hlapp at gmx.net> wrote:
>>
>>
>> On Aug 26, 2009, at 6:53 AM, Peter wrote:
>>
>>> The BioSQL schema includes a few indexes on the location table
>>> (e.g. quoting the MySQL schema, but it looks the same on pg too):
>>>
>>> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
>>> [...]
>>> Will these facilitate searches like this?:
>>>
>>> "SELECT ... WHERE 2000 <= location.start_pos
>>> AND location.end_pos <= 5000 AND ..."
>>>
>>> Or, for this would it help to include:
>>>
>>> CREATE INDEX seqfeatureloc_start ON location(start_pos);
>>> CREATE INDEX seqfeatureloc_start ON location(end_pos);
>>
>> With a decent RDBMS, having two indexes instead of a compound one  
>> will slow
>> this query down. What the compound one won't help you with is if  
>> your query
>> doesn't constrain the leading columns. For example, a compound  
>> index on
>> (start_pos,end_pos) won't be used if you only constrain end_pos. If  
>> you want
>> to do that, you need on index on (end_pos) too.
>
> Thanks for your reply Hilmar. Just to make sure I understood, the  
> current
> BioSQL indexes are fine for this:
>
> "SELECT ... WHERE 2000 <= location.start_pos
> AND location.end_pos <= 5000 AND ..."
>
> but not so great for:
>
> "SELECT ... WHERE 2000 <= location.start_pos AND ..."

No, this one will work fine. (provided that start_pos comes first in  
the index)

>
> or,
>
> "SELECT ... WHERE location.end_pos <= 5000 AND ..."

Yes.

> [...]
> Having just two separated indexes on start_pos and end_pos would
> speed up queries on just start or end, but would slow down queries
> using both.

Yes (though not necessarily much), and occupy more space.

>
> Presumably having three indexes as follows would cover all these
> examples efficiently, but at the cost of two more indexes?:
>
> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
> CREATE INDEX seqfeatureloc_start ON location(start_pos);
> CREATE INDEX seqfeatureloc_start ON location(end_pos);

With this set, the waste of space for the compound index probably far  
outweighs the performance gain you might see from it. If I need to be  
able to constrain by both independently, I create a compound index,  
and separate indexes for each column after the first in the index.  
I.e., for the purposes of querying by start_pos,

CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
CREATE INDEX seqfeatureloc_start ON location(start_pos);

are redundant.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From haili at mpiz-koeln.mpg.de  Thu Aug 27 03:51:07 2009
From: haili at mpiz-koeln.mpg.de (Song Haili)
Date: Thu, 27 Aug 2009 09:51:07 +0200
Subject: [BioSQL-l] error with load_ontology
In-Reply-To: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>
References: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
	<78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>
Message-ID: <fc26d0704793.4a96570b@mpiz-koeln.mpg.de>

Hi Hilmar,

I loaded the data again and found that the biological process GO terms were loaded, although with some warnings:
--------------------- WARNING ---------------------
MSG: DBLink exists in the dblink of _default
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: DBLink exists in the dblink of _default
---------------------------------------------------

But when starting to load molecular function GO terms, process terminated with the following warnings and error message.

??????? Done with biological_process.
Loading ontology molecular_function:
??????? ... terms

--------------------- WARNING ---------------------
MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
ERROR:? value too long for type character varying(255)
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (HAS activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
ERROR:? current transaction is aborted, commands ignored until end of transaction block
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (seHAS RELATED EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
ERROR:? current transaction is aborted, commands ignored until end of transaction block
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS RELATED EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
ERROR:? current transaction is aborted, commands ignored until end of transaction block
---------------------------------------------------
Could not store term GO:0050501, name 'hyaluronan synthase activity':

------------- EXCEPTION -------------
MSG: error while executing statement in Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:? current transaction is aborted, commands ignored until end of transaction block
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195
STACK Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
STACK Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
STACK (eval) load_ontology.pl:812
STACK main::persist_term load_ontology.pl:794
STACK toplevel load_ontology.pl:617
-------------------------------------
?at load_ontology.pl line 824
??????? main::persist_term('-term', 'Bio::Ontology::OBOterm=HASH(0x96699d0)', '-db', 'Bio::DB::BioSQL::DBAdaptor=HASH(0xd90620)', '-termfactory', undef, '-throw', 'CODE(0x76ab60)', '-mergeobs', ...) called at load_ontology.pl line 617

I am using biosql-1.0.0 downloaded directly from http://www.biosql.org/wiki/Downloads without any changes. So I am not sure if the RULEs have been removed. By the way, before I met the above error, I was able to use the script load_seqdatabase.pl to load swissprot data with many warnings.

song


----- Original Message -----
From: Hilmar Lapp <hlapp at gmx.net>
Date: Wednesday, August 26, 2009 17:50
Subject: Re: [BioSQL-l] error with load_ontology
To: Song Haili <haili at mpiz-koeln.mpg.de>
Cc: biosql-l at lists.open-bio.org

> Song,
> 
> there should have been an error or warning that immediately 
> preceded? 
> this errors. It is that one that's the root cause.
> 
> Also, are you using by any chance the BioSQL version for 
> PostgreSQL? 
> that has the RULEs removed? If yes, then at this point you 
> cannot use? 
> any Bioperl-db scripts (or code) with it, unless you install the 
> rules? 
> before you run such a script (and presumably remove them 
> again? 
> afterwards).
> 
> 	-hilmar
> 
> On Aug 26, 2009, at 10:18 AM, Song Haili wrote:
> 
> > Hi All,
> > I encountered an error message when using load_ontology.pl to 
> load? 
> > gene ontology into biosql database. The command used is:
> >
> > perl load_ontology.pl --driver Pg --host pg-server --dbname 
> dbname -- 
> > dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --
> format? 
> > obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete.
> >
> > At the beginning,? data can be loaded with warnings, but 
> late an? 
> > exception occurred and the loading was terminated. Waring and 
> error? 
> > messages? shown below:
> >
> >? --------------------- WARNING ---------------------MSG: 
> failed to? 
> > store term synonym (Bio::DB::BioSQL::TermAdaptor) with values 
> (spHAS? 
> > RELATED EC:2.4.1.212) (FK 20447 to 
> Bio::Ontology::OBOterm):ERROR:?? 
> > current transaction is aborted, commands ignored until end 
> of? 
> > transaction block----------------------------------------------
> ----- 
> > Could not store term GO:0050501, name 'hyaluronan 
> synthase? 
> > activity':------------- EXCEPTION -------------MSG: error 
> while? 
> > executing statement in? 
> > Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: 
> ERROR:? current? 
> > transaction is aborted, commands ignored until end of 
> transaction? 
> > blockSTACK? 
> > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key 
> /perl/ 
> > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: 
> > 970STACK? 
> > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key 
> /perl/ 
> > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: 
> > 873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
> /perl/lib/ 
> > 
> site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK? 
> > Bio::DB::BioSQL::TermAdaptor::store_children 
> /perl/lib/site_perl/ 
> > 5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK? 
> > Bio::DB::BioSQL::BasePersistenceAdaptor::create 
> /perl/lib/site_perl/ 
> > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK? 
> > Bio::DB::BioSQL::BasePersistenceAdaptor::store 
> /perl/lib/site_perl/ 
> > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK? 
> > Bio::DB::Persistent::PersistentObject::store 
> /perl/lib/site_perl/ 
> > 5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK 
> (eval)? 
> > load_ontology.pl:812STACK main::persist_term load_ontology.pl: 
> > 794STACK toplevel load_ontology.pl: 
> > 617-------------------------------------
> > Can you please help me to solve this problem out? Thank you 
> very much.
> > Best regards,
> > song
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> -- 
> ===========================================================
> : Hilmar Lapp? -:-? Durham, NC? -:-? hlapp 
> at gmx dot net :
> ===========================================================
> 
> 
>


From biopython at maubp.freeserve.co.uk  Thu Aug 27 06:24:23 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 27 Aug 2009 11:24:23 +0100
Subject: [BioSQL-l] error with load_ontology
In-Reply-To: <fc26d0704793.4a96570b@mpiz-koeln.mpg.de>
References: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
	<78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>
	<fc26d0704793.4a96570b@mpiz-koeln.mpg.de>
Message-ID: <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com>

On Thu, Aug 27, 2009 at 8:51 AM, Song Haili wrote:

> --------------------- WARNING ---------------------
> MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
> ERROR:? value too long for type character varying(255)
> ---------------------------------------------------

Extending the relevant field in the schema might be one solution...

> I am using biosql-1.0.0 downloaded directly from
> http://www.biosql.org/wiki/Downloads without any changes.
> So I am not sure if the RULEs have been removed. By the
> way, before I met the above error, I was able to use the script
> load_seqdatabase.pl to load swissprot data with many warnings.

BioSQL 1.0.0 is out of date, the latest release is 1.0.1
Was that a typo?

Peter


From haili at mpiz-koeln.mpg.de  Thu Aug 27 10:55:12 2009
From: haili at mpiz-koeln.mpg.de (Song Haili)
Date: Thu, 27 Aug 2009 16:55:12 +0200
Subject: [BioSQL-l] error with load_ontology
In-Reply-To: <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com>
References: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
	<78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>
	<fc26d0704793.4a96570b@mpiz-koeln.mpg.de>
	<320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com>
Message-ID: <fc18b40150b.4a96ba70@mpiz-koeln.mpg.de>

Problem solved!
If the file type of synonym of the table of term_synonym is changed from varchar(255) to text, there is no error occurred anymore. However this only works for biosql-1.0.0 (maybe it also works for the latest version biosql-1.0.1, but I didn't do many test). 
Thank you all for your help.
song

----- Original Message -----
From: Peter <biopython at maubp.freeserve.co.uk>
Date: Thursday, August 27, 2009 12:24
Subject: Re: [BioSQL-l] error with load_ontology
To: Song Haili <haili at mpiz-koeln.mpg.de>
Cc: Hilmar Lapp <hlapp at gmx.net>, biosql-l at lists.open-bio.org

> On Thu, Aug 27, 2009 at 8:51 AM, Song Haili wrote:
> 
> > --------------------- WARNING ---------------------
> > MSG: failed to store term synonym 
> (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-
> alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent 
> hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-
> alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent 
> hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT 
> EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
> > ERROR:? value too long for type character varying(255)
> > ---------------------------------------------------
> 
> Extending the relevant field in the schema might be one solution...
> 
> > I am using biosql-1.0.0 downloaded directly from
> > http://www.biosql.org/wiki/Downloads without any changes.
> > So I am not sure if the RULEs have been removed. By the
> > way, before I met the above error, I was able to use the script
> > load_seqdatabase.pl to load swissprot data with many warnings.
> 
> BioSQL 1.0.0 is out of date, the latest release is 1.0.1
> Was that a typo?
> 
> Peter


From florian.mittag at uni-tuebingen.de  Thu Aug  6 09:43:56 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 6 Aug 2009 11:43:56 +0200
Subject: [BioSQL-l] Error when loading Gene Ontology to biosql
In-Reply-To: <52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu>
References: <fc0bfd871e72.4a65b072@mpiz-koeln.mpg.de>
	<1E596269-ED8F-4ADF-9B54-A9A0CF908620@gmx.net>
	<52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu>
Message-ID: <200908061143.56479.florian.mittag@uni-tuebingen.de>

Hi!

On Friday, 24. July 2009 02:39, Chris Fields wrote:
> The warning is interesting, as it derives from our rollback of feature/
> annotation stuff in bioperl.  It indicates the specified DBLink is
> duplicated in the Bio::Ontology::Term.
>
> The exception makes sense in light of that (and seems to confirm the
> link was already present).

I'm getting the same warnings with my custom DB2 driver and with MySQL, but 
the script completes successfully. I get them when loading the Gene Ontology 
and the Sequence Ontology.

-------------------- WARNING ---------------------
MSG: GOC:mah exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: PMID:12297042 exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: GOC:mah exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: GOC:rph exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: PMID:12930826 exists in the dblink of _default
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: PMID:15012271 exists in the dblink of _default
---------------------------------------------------

[...]
Done with sequence.
Done, cleaning up.


What to do?

- Florian

>
> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote:
> > Hi Carlos - that's an odd error that we haven't seen yet. My first
> > impulse would be to suspect that your database wasn't empty when you
> > ran this, and that the error you got is due to a term in the input
> > file clashing with one you already have in the database.
> >
> > You can check this by looking into your database:
> >
> > SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name =
> > 'invasive growth';
> >
> > Does this return anything?
> >
> > Note that load_ontology.pl is perfectly equipped to update an
> > existing ontology - check the POD and look for the --lookup command
> > line option (and the several options following it in the POD with
> > which you can modify the exact update behavior). By default though
> > the script will assume that it is loading a new ontology.
> >
> > 	-hilmar
> >
> > On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote:
> >> Hi Hilmar,
> >>
> >> thanks for the help. I've tried now this
> >>
> >> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass
> >> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo
> >>
> >> downloaded from here
> >>
> >> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.ob
> >>o
> >>
> >> and I have this error message.
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: DBLink 	 _default
> >> ---------------------------------------------------
> >> Could not store term GO:0001404, name 'invasive growth':
> >>
> >> ------------- EXCEPTION: Bio::Root::Exception -------------
> >> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to
> >> be found by unique key
> >> STACK: Error::throw
> >> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/
> >> Root.pm:357
> >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/
> >> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
> >> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
> >> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
> >> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
> >> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
> >> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
> >> load_ontology.pl:812
> >> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
> >> -----------------------------------------------------------
> >>
> >> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
> >>       main::persist_term('-term',
> >> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db',
> >> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory',
> >> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at /
> >> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
> >>
> >> Any hints to know where the problem would be?
> >>
> >> Thanks in advance,
> >>
> >> Carlos
> >>
> >> Carlos  Canchaya
> >> ccanchaya at gmail.com
> >>
> >> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote:
> >>> Please leave off the --fmtargs GO.defs argument - this is not a
> >>> file in the .obo format.
> >>>
> >>> 	-hilmar
> >>>
> >>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote:
> >>>> Hi guys,
> >>>>
> >>>> I've tried to execute load_ontologies following your suggestions as
> >>>>
> >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy --
> >>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format
> >>>> obo gene_ontology.1_2.obo
> >>>>
> >>>> However I have many warnings first
> >>>>
> >>>> --------------------- WARNING ---------------------
> >>>> MSG: DBLink exists in the dblink of _default
> >>>> ---------------------------------------------------
> >>>>
> >>>> and then
> >>>>
> >>>> --------------------- WARNING ---------------------
> >>>> MSG: DBLink exists in the dblink of _default
> >>>> ---------------------------------------------------
> >>>> Could not store term GO:0001404, name 'invasive growth':
> >>>>
> >>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or
> >>>> to be found by unique key
> >>>> STACK: Error::throw
> >>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/
> >>>> bioperl-live//Bio/Root/Root.pm:357
> >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
> >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
> >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
> >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
> >>>> load_ontology.pl:812
> >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
> >>>> -----------------------------------------------------------
> >>>>
> >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
> >>>>     main::persist_term('-term',
> >>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db',
> >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory',
> >>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at /
> >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
> >>>>
> >>>>
> >>>> Any ideas why?
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> Carlos
> >>>>
> >>>>
> >>>> Carlos  Canchaya
> >>>> ccanchaya at gmail.com


From hlapp at gmx.net  Thu Aug  6 13:46:06 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Aug 2009 09:46:06 -0400
Subject: [BioSQL-l] Error when loading Gene Ontology to biosql
In-Reply-To: <200908061143.56479.florian.mittag@uni-tuebingen.de>
References: <fc0bfd871e72.4a65b072@mpiz-koeln.mpg.de>
	<1E596269-ED8F-4ADF-9B54-A9A0CF908620@gmx.net>
	<52ED5492-14F1-443C-AB1E-67685A464656@illinois.edu>
	<200908061143.56479.florian.mittag@uni-tuebingen.de>
Message-ID: <BF35C26A-53FC-4057-A71D-771FEF50AE3B@gmx.net>

The warnings are fine. They simply indicates that a dbxref is being  
added to the term that it already had.

Part of the reason for that happening may be that Bioperl-db doesn't  
support different kinds of dbxrefs for terms yet, if I recall  
correctly, so once retrieved from the database they all end up in the  
_default category.

	-hilmar

On Aug 6, 2009, at 5:43 AM, Florian Mittag wrote:

> Hi!
>
> On Friday, 24. July 2009 02:39, Chris Fields wrote:
>> The warning is interesting, as it derives from our rollback of  
>> feature/
>> annotation stuff in bioperl.  It indicates the specified DBLink is
>> duplicated in the Bio::Ontology::Term.
>>
>> The exception makes sense in light of that (and seems to confirm the
>> link was already present).
>
> I'm getting the same warnings with my custom DB2 driver and with  
> MySQL, but
> the script completes successfully. I get them when loading the Gene  
> Ontology
> and the Sequence Ontology.
>
> -------------------- WARNING ---------------------
> MSG: GOC:mah exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: PMID:12297042 exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: GOC:mah exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: GOC:rph exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: PMID:12930826 exists in the dblink of _default
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: PMID:15012271 exists in the dblink of _default
> ---------------------------------------------------
>
> [...]
> Done with sequence.
> Done, cleaning up.
>
>
> What to do?
>
> - Florian
>
>>
>> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote:
>>> Hi Carlos - that's an odd error that we haven't seen yet. My first
>>> impulse would be to suspect that your database wasn't empty when you
>>> ran this, and that the error you got is due to a term in the input
>>> file clashing with one you already have in the database.
>>>
>>> You can check this by looking into your database:
>>>
>>> SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name =
>>> 'invasive growth';
>>>
>>> Does this return anything?
>>>
>>> Note that load_ontology.pl is perfectly equipped to update an
>>> existing ontology - check the POD and look for the --lookup command
>>> line option (and the several options following it in the POD with
>>> which you can modify the exact update behavior). By default though
>>> the script will assume that it is loading a new ontology.
>>>
>>> 	-hilmar
>>>
>>> On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote:
>>>> Hi Hilmar,
>>>>
>>>> thanks for the help. I've tried now this
>>>>
>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass
>>>> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo
>>>>
>>>> downloaded from here
>>>>
>>>> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.ob
>>>> o
>>>>
>>>> and I have this error message.
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: DBLink 	 _default
>>>> ---------------------------------------------------
>>>> Could not store term GO:0001404, name 'invasive growth':
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to
>>>> be found by unique key
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/ 
>>>> Root/
>>>> Root.pm:357
>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/
>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
>>>> load_ontology.pl:812
>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
>>>> -----------------------------------------------------------
>>>>
>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
>>>>      main::persist_term('-term',
>>>> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db',
>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory',
>>>> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at /
>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
>>>>
>>>> Any hints to know where the problem would be?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Carlos
>>>>
>>>> Carlos  Canchaya
>>>> ccanchaya at gmail.com
>>>>
>>>> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote:
>>>>> Please leave off the --fmtargs GO.defs argument - this is not a
>>>>> file in the .obo format.
>>>>>
>>>>> 	-hilmar
>>>>>
>>>>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote:
>>>>>> Hi guys,
>>>>>>
>>>>>> I've tried to execute load_ontologies following your  
>>>>>> suggestions as
>>>>>>
>>>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy --
>>>>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format
>>>>>> obo gene_ontology.1_2.obo
>>>>>>
>>>>>> However I have many warnings first
>>>>>>
>>>>>> --------------------- WARNING ---------------------
>>>>>> MSG: DBLink exists in the dblink of _default
>>>>>> ---------------------------------------------------
>>>>>>
>>>>>> and then
>>>>>>
>>>>>> --------------------- WARNING ---------------------
>>>>>> MSG: DBLink exists in the dblink of _default
>>>>>> ---------------------------------------------------
>>>>>> Could not store term GO:0001404, name 'invasive growth':
>>>>>>
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or
>>>>>> to be found by unique key
>>>>>> STACK: Error::throw
>>>>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/
>>>>>> bioperl-live//Bio/Root/Root.pm:357
>>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/ 
>>>>>> local/
>>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
>>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
>>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
>>>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
>>>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
>>>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
>>>>>> load_ontology.pl:812
>>>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
>>>>>>    main::persist_term('-term',
>>>>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db',
>>>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory',
>>>>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at /
>>>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
>>>>>>
>>>>>>
>>>>>> Any ideas why?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Carlos
>>>>>>
>>>>>>
>>>>>> Carlos  Canchaya
>>>>>> ccanchaya at gmail.com

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From florian.mittag at uni-tuebingen.de  Thu Aug  6 14:20:31 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Thu, 6 Aug 2009 16:20:31 +0200
Subject: [BioSQL-l] Error when loading Gene Ontology to biosql
In-Reply-To: <BF35C26A-53FC-4057-A71D-771FEF50AE3B@gmx.net>
References: <fc0bfd871e72.4a65b072@mpiz-koeln.mpg.de>
	<200908061143.56479.florian.mittag@uni-tuebingen.de>
	<BF35C26A-53FC-4057-A71D-771FEF50AE3B@gmx.net>
Message-ID: <200908061620.31766.florian.mittag@uni-tuebingen.de>

Ok, that's a relieve. Thanks for the quick answer!

- Florian

On Thursday, 6. August 2009 15:46, Hilmar Lapp wrote:
> The warnings are fine. They simply indicates that a dbxref is being
> added to the term that it already had.
>
> Part of the reason for that happening may be that Bioperl-db doesn't
> support different kinds of dbxrefs for terms yet, if I recall
> correctly, so once retrieved from the database they all end up in the
> _default category.
>
> 	-hilmar
>
> On Aug 6, 2009, at 5:43 AM, Florian Mittag wrote:
> > Hi!
> >
> > On Friday, 24. July 2009 02:39, Chris Fields wrote:
> >> The warning is interesting, as it derives from our rollback of
> >> feature/
> >> annotation stuff in bioperl.  It indicates the specified DBLink is
> >> duplicated in the Bio::Ontology::Term.
> >>
> >> The exception makes sense in light of that (and seems to confirm the
> >> link was already present).
> >
> > I'm getting the same warnings with my custom DB2 driver and with
> > MySQL, but
> > the script completes successfully. I get them when loading the Gene
> > Ontology
> > and the Sequence Ontology.
> >
> > -------------------- WARNING ---------------------
> > MSG: GOC:mah exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: PMID:12297042 exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: GOC:mah exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: GOC:rph exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: PMID:12930826 exists in the dblink of _default
> > ---------------------------------------------------
> >
> > -------------------- WARNING ---------------------
> > MSG: PMID:15012271 exists in the dblink of _default
> > ---------------------------------------------------
> >
> > [...]
> > Done with sequence.
> > Done, cleaning up.
> >
> >
> > What to do?
> >
> > - Florian
> >
> >> On Jul 23, 2009, at 7:49 AM, Hilmar Lapp wrote:
> >>> Hi Carlos - that's an odd error that we haven't seen yet. My first
> >>> impulse would be to suspect that your database wasn't empty when you
> >>> ran this, and that the error you got is due to a term in the input
> >>> file clashing with one you already have in the database.
> >>>
> >>> You can check this by looking into your database:
> >>>
> >>> SQL> SELECT * FROM term WHERE identifier = 'GO:0001404' or name =
> >>> 'invasive growth';
> >>>
> >>> Does this return anything?
> >>>
> >>> Note that load_ontology.pl is perfectly equipped to update an
> >>> existing ontology - check the POD and look for the --lookup command
> >>> line option (and the several options following it in the POD with
> >>> which you can modify the exact update behavior). By default though
> >>> the script will assume that it is loading a new ontology.
> >>>
> >>> 	-hilmar
> >>>
> >>> On Jul 23, 2009, at 3:27 AM, Carlos A. Canchaya wrote:
> >>>> Hi Hilmar,
> >>>>
> >>>> thanks for the help. I've tried now this
> >>>>
> >>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyyy--dbpass
> >>>> xxxx --namespace "Gene Ontology" --format obo gene_ontology.1_2.obo
> >>>>
> >>>> downloaded from here
> >>>>
> >>>> http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology.1_2.
> >>>>ob o
> >>>>
> >>>> and I have this error message.
> >>>>
> >>>> --------------------- WARNING ---------------------
> >>>> MSG: DBLink 	 _default
> >>>> ---------------------------------------------------
> >>>> Could not store term GO:0001404, name 'invasive growth':
> >>>>
> >>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or to
> >>>> be found by unique key
> >>>> STACK: Error::throw
> >>>> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/
> >>>> Root/
> >>>> Root.pm:357
> >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
> >>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
> >>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
> >>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
> >>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
> >>>> load_ontology.pl:812
> >>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
> >>>> -----------------------------------------------------------
> >>>>
> >>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
> >>>>      main::persist_term('-term',
> >>>> 'Bio::Ontology::OBOterm=HASH(0x9330318)', '-db',
> >>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x8a17ac0)', '-termfactory',
> >>>> undef, '-throw', 'CODE(0x85f4708)', '-mergeobs', ...) called at /
> >>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
> >>>>
> >>>> Any hints to know where the problem would be?
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> Carlos
> >>>>
> >>>> Carlos  Canchaya
> >>>> ccanchaya at gmail.com
> >>>>
> >>>> On Jul 22, 2009, at 8:15 PM, Hilmar Lapp wrote:
> >>>>> Please leave off the --fmtargs GO.defs argument - this is not a
> >>>>> file in the .obo format.
> >>>>>
> >>>>> 	-hilmar
> >>>>>
> >>>>> On Jul 22, 2009, at 11:05 AM, Carlos A. Canchaya wrote:
> >>>>>> Hi guys,
> >>>>>>
> >>>>>> I've tried to execute load_ontologies following your
> >>>>>> suggestions as
> >>>>>>
> >>>>>> load_ontology.pl --driver Pg --dbname biosql --dbuser yyy --
> >>>>>> dbpass xxx --namespace "Gene Ontology" --fmtargs GO.defs --format
> >>>>>> obo gene_ontology.1_2.obo
> >>>>>>
> >>>>>> However I have many warnings first
> >>>>>>
> >>>>>> --------------------- WARNING ---------------------
> >>>>>> MSG: DBLink exists in the dblink of _default
> >>>>>> ---------------------------------------------------
> >>>>>>
> >>>>>> and then
> >>>>>>
> >>>>>> --------------------- WARNING ---------------------
> >>>>>> MSG: DBLink exists in the dblink of _default
> >>>>>> ---------------------------------------------------
> >>>>>> Could not store term GO:0001404, name 'invasive growth':
> >>>>>>
> >>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>>>>> MSG: create: object (Bio::Ontology::OBOterm) failed to insert or
> >>>>>> to be found by unique key
> >>>>>> STACK: Error::throw
> >>>>>> STACK: Bio::Root::Root::throw /home/carlos/nascent/download/
> >>>>>> bioperl-live//Bio/Root/Root.pm:357
> >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/
> >>>>>> local/
> >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:219
> >>>>>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/local/
> >>>>>> share/perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
> >>>>>> STACK: Bio::DB::Persistent::PersistentObject::store /usr/local/
> >>>>>> share/perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
> >>>>>> STACK: main::persist_term /tmp/BioPerl-db-1.6.0/scripts/biosql/
> >>>>>> load_ontology.pl:812
> >>>>>> STACK: /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl:617
> >>>>>> -----------------------------------------------------------
> >>>>>>
> >>>>>> at /tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 824
> >>>>>>    main::persist_term('-term',
> >>>>>> 'Bio::Ontology::OBOterm=HASH(0x9c86078)', '-db',
> >>>>>> 'Bio::DB::BioSQL::DBAdaptor=HASH(0x936ed50)', '-termfactory',
> >>>>>> undef, '-throw', 'CODE(0x8f49a50)', '-mergeobs', ...) called at /
> >>>>>> tmp/BioPerl-db-1.6.0/scripts/biosql/load_ontology.pl line 617
> >>>>>>
> >>>>>>
> >>>>>> Any ideas why?
> >>>>>>
> >>>>>> Thanks in advance,
> >>>>>>
> >>>>>> Carlos
> >>>>>>
> >>>>>>
> >>>>>> Carlos  Canchaya
> >>>>>> ccanchaya at gmail.com

-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From haili at mpiz-koeln.mpg.de  Mon Aug 10 14:21:39 2009
From: haili at mpiz-koeln.mpg.de (Song Haili)
Date: Mon, 10 Aug 2009 16:21:39 +0200
Subject: [BioSQL-l] how to load other data to biosql database?
Message-ID: <fc26e9403015.4a804913@mpiz-koeln.mpg.de>

Dear all,

Does any of you know how to load other data, such as domain, EC number, Mapman bins, Interaction , Kegg Ontology etc, into biosql database? Is it possible by using load_ontology.pl? If it is, what are the corresponding arguments? Otherwise, should I write my own scripts? Any suggestion will be highly appreciated!

Best regards,

song


From florian.mittag at uni-tuebingen.de  Tue Aug 11 08:10:12 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Tue, 11 Aug 2009 10:10:12 +0200
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer to?
Message-ID: <200908111010.12143.florian.mittag@uni-tuebingen.de>

Hi!

I stumbled upon an old post from Hilmar:

On Tue, 18 Mar 2003, Hilmar Lapp wrote:
> type_term_id is supposed to reference an SO term. source is supposed to
> denote the 'method'  (BLAST, BLAT, sim4, genewise, whatnot), as far as
> my understanding goes. In the case of reading the features from a
> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the source
> (which is what the genbank, embl, and swissprot parsers do in bioperl)
> is maybe stretching the definition, but I don't have something
> substantially better to offer.

I inspected the database after I imported some Genbank files with BioJava, and 
I found that the source_term_id for the seqfeatures is always set to the ID 
of an automatically inserted term "Genbank" with definition "auto-generated 
by biojavax".

I was wondering if there is anything new to the source_term_id.

- Florian


From florian.mittag at uni-tuebingen.de  Tue Aug 11 09:09:50 2009
From: florian.mittag at uni-tuebingen.de (Florian Mittag)
Date: Tue, 11 Aug 2009 11:09:50 +0200
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
Message-ID: <200908111109.50361.florian.mittag@uni-tuebingen.de>

Hm, I should've mentioned my real concern. We're integrating all kinds of data 
into the database and right now I want to import miRNA information (sequences 
and target sites) from miRBase (http://microrna.sanger.ac.uk/sequences/).
The files I download from there specify "miRanda" as METHOD, so should I use 
this as source term or miRBase?

Thanks,
- Florian

On Tuesday, 11. August 2009 10:59, Richard Holland wrote:
> The reason BJX does that is because the Genbank format has no
> indication of where a feature came from. So, all there is to go on is
> that it came from Genbank! This allows us to differentiate between
> features on a sequence that were loaded from an original file, and new
> features that have been added to the sequence in the db after it was
> loaded (e.g. by running blast, blat etc. against some local data).
>
> On 11 Aug 2009, at 09:10, Florian Mittag wrote:
> > Hi!
> >
> > I stumbled upon an old post from Hilmar:
> >
> > On Tue, 18 Mar 2003, Hilmar Lapp wrote:
> >> type_term_id is supposed to reference an SO term. source is
> >> supposed to
> >> denote the 'method'  (BLAST, BLAT, sim4, genewise, whatnot), as far
> >> as
> >> my understanding goes. In the case of reading the features from a
> >> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the
> >> source
> >> (which is what the genbank, embl, and swissprot parsers do in
> >> bioperl)
> >> is maybe stretching the definition, but I don't have something
> >> substantially better to offer.
> >
> > I inspected the database after I imported some Genbank files with
> > BioJava, and
> > I found that the source_term_id for the seqfeatures is always set to
> > the ID
> > of an automatically inserted term "Genbank" with definition "auto-
> > generated
> > by biojavax".
> >
> > I was wondering if there is anything new to the source_term_id.
> >
> > - Florian
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/

-- 
Dipl. Inf. Florian Mittag
Universit?t Tuebingen
WSI-RA, Sand 1
72076 Tuebingen, Germany
Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091


From holland at eaglegenomics.com  Tue Aug 11 09:22:41 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 11 Aug 2009 10:22:41 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <200908111109.50361.florian.mittag@uni-tuebingen.de>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
Message-ID: <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>

Ideally there would be two fields for source_term_id - one for the  
algorithm used to generate the data (e.g. BLAST, miRanda), the other  
for the source the data came from (e.g. Genbank, miRBase). These are  
two very distinct concepts and it is not easy to represent them  
successfully using a single ontology source_term_id field. So the only  
way round it if you need to represent both algorithm and source is to  
create your own ontology which is a cross-product of the two possible  
sets of values (triples would be good for this).

If you want to use only a single term, basically it's up to you  
whether you choose to annotate by algorithm (miRanda) or by source  
(miRBase). I expect the decision will rest on whether it is more  
important for you to know which features in your database were added  
locally and which came from a remote source, or if knowing the  
algorithm used to generate them is more important. Otherwise if both  
are important the cross-product triple approach is probably the only  
way to go.

cheers,
Richard

On 11 Aug 2009, at 10:09, Florian Mittag wrote:

> Hm, I should've mentioned my real concern. We're integrating all  
> kinds of data
> into the database and right now I want to import miRNA information  
> (sequences
> and target sites) from miRBase (http://microrna.sanger.ac.uk/sequences/ 
> ).
> The files I download from there specify "miRanda" as METHOD, so  
> should I use
> this as source term or miRBase?
>
> Thanks,
> - Florian
>
> On Tuesday, 11. August 2009 10:59, Richard Holland wrote:
>> The reason BJX does that is because the Genbank format has no
>> indication of where a feature came from. So, all there is to go on is
>> that it came from Genbank! This allows us to differentiate between
>> features on a sequence that were loaded from an original file, and  
>> new
>> features that have been added to the sequence in the db after it was
>> loaded (e.g. by running blast, blat etc. against some local data).
>>
>> On 11 Aug 2009, at 09:10, Florian Mittag wrote:
>>> Hi!
>>>
>>> I stumbled upon an old post from Hilmar:
>>>
>>> On Tue, 18 Mar 2003, Hilmar Lapp wrote:
>>>> type_term_id is supposed to reference an SO term. source is
>>>> supposed to
>>>> denote the 'method'  (BLAST, BLAT, sim4, genewise, whatnot), as far
>>>> as
>>>> my understanding goes. In the case of reading the features from a
>>>> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the
>>>> source
>>>> (which is what the genbank, embl, and swissprot parsers do in
>>>> bioperl)
>>>> is maybe stretching the definition, but I don't have something
>>>> substantially better to offer.
>>>
>>> I inspected the database after I imported some Genbank files with
>>> BioJava, and
>>> I found that the source_term_id for the seqfeatures is always set to
>>> the ID
>>> of an automatically inserted term "Genbank" with definition "auto-
>>> generated
>>> by biojavax".
>>>
>>> I was wondering if there is anything new to the source_term_id.
>>>
>>> - Florian
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>
> -- 
> Dipl. Inf. Florian Mittag
> Universit?t Tuebingen
> WSI-RA, Sand 1
> 72076 Tuebingen, Germany
> Phone: +49 7071 / 29 78985  Fax: +49 7071 / 29 5091

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From holland at eaglegenomics.com  Tue Aug 11 08:59:27 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Tue, 11 Aug 2009 09:59:27 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <200908111010.12143.florian.mittag@uni-tuebingen.de>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
Message-ID: <57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>

The reason BJX does that is because the Genbank format has no  
indication of where a feature came from. So, all there is to go on is  
that it came from Genbank! This allows us to differentiate between  
features on a sequence that were loaded from an original file, and new  
features that have been added to the sequence in the db after it was  
loaded (e.g. by running blast, blat etc. against some local data).

On 11 Aug 2009, at 09:10, Florian Mittag wrote:

> Hi!
>
> I stumbled upon an old post from Hilmar:
>
> On Tue, 18 Mar 2003, Hilmar Lapp wrote:
>> type_term_id is supposed to reference an SO term. source is  
>> supposed to
>> denote the 'method'  (BLAST, BLAT, sim4, genewise, whatnot), as far  
>> as
>> my understanding goes. In the case of reading the features from a
>> GenBank feature table, assigning 'Genbank/EMBL/Swissprot' as the  
>> source
>> (which is what the genbank, embl, and swissprot parsers do in  
>> bioperl)
>> is maybe stretching the definition, but I don't have something
>> substantially better to offer.
>
> I inspected the database after I imported some Genbank files with  
> BioJava, and
> I found that the source_term_id for the seqfeatures is always set to  
> the ID
> of an automatically inserted term "Genbank" with definition "auto- 
> generated
> by biojavax".
>
> I was wondering if there is anything new to the source_term_id.
>
> - Florian
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From hlapp at gmx.net  Fri Aug 14 22:56:11 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Aug 2009 18:56:11 -0400
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
Message-ID: <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>


On Aug 11, 2009, at 5:22 AM, Richard Holland wrote:

> Ideally there would be two fields for source_term_id - one for the  
> algorithm used to generate the data (e.g. BLAST, miRanda), the other  
> for the source the data came from (e.g. Genbank, miRBase).


You mean the source of the data that it was applied to.

I agree though that if you want both you can create a cross-product  
term and store the decomposition as term_relationship's.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at eaglegenomics.com  Sat Aug 15 10:44:16 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Sat, 15 Aug 2009 11:44:16 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
Message-ID: <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>


On 14 Aug 2009, at 23:56, Hilmar Lapp wrote:

>
> On Aug 11, 2009, at 5:22 AM, Richard Holland wrote:
>
>> Ideally there would be two fields for source_term_id - one for the  
>> algorithm used to generate the data (e.g. BLAST, miRanda), the  
>> other for the source the data came from (e.g. Genbank, miRBase).
>
>
> You mean the source of the data that it was applied to.

Not necessarily. The source of the data that it was applied to (ie.  
the sequence the feature refers to) is a third thing - and that is an  
attribute of the sequence the feature refers to, rather than the  
feature itself.

What I mean is this:

   1. The sequence itself could be downloaded from Genbank, EMBL, or  
elsewhere, or I could have discovered it in-house.
   2. The features on the sequence could have been generated by  
running BLAST, miRBase, etc., or they could be manually annotated.
   3. The features on the sequence could have been downloaded from  
Genbank, EMBL, etc., or they could have been made locally, or by a  
collaborator at another institute.

To my mind these are three distinct things. (1) is sequence-related,  
and (2) and (3) are feature-related.

cheers,
Richard

> I agree though that if you want both you can create a cross-product  
> term and store the decomposition as term_relationship's.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From hlapp at gmx.net  Sat Aug 15 14:29:24 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 15 Aug 2009 10:29:24 -0400
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
Message-ID: <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>


On Aug 15, 2009, at 6:44 AM, Richard Holland wrote:

> [...]
> What I mean is this:
>
>  1. The sequence itself could be downloaded from Genbank, EMBL, or  
> elsewhere, or I could have discovered it in-house.

That's actually what I meant.

>  2. The features on the sequence could have been generated by  
> running BLAST, miRBase, etc., or they could be manually annotated.
>  3. The features on the sequence could have been downloaded from  
> Genbank, EMBL, etc., or they could have been made locally, or by a  
> collaborator at another institute.

Right, but if a feature is the result of you running some algorithm  
against some sequences, then it's not been downloaded or given to you.  
Features on one and the same sequence can have different sources,  
obviously, so I'm a bit confused - I think we're talking about the  
same thing in different words, but I'm not sure.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at eaglegenomics.com  Sat Aug 15 16:32:35 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Sat, 15 Aug 2009 17:32:35 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
	<30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
Message-ID: <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>


On 15 Aug 2009, at 15:29, Hilmar Lapp wrote:

>
> On Aug 15, 2009, at 6:44 AM, Richard Holland wrote:
>
>> [...]
>> What I mean is this:
>>
>> 1. The sequence itself could be downloaded from Genbank, EMBL, or  
>> elsewhere, or I could have discovered it in-house.
>
> That's actually what I meant.
>
>> 2. The features on the sequence could have been generated by  
>> running BLAST, miRBase, etc., or they could be manually annotated.
>> 3. The features on the sequence could have been downloaded from  
>> Genbank, EMBL, etc., or they could have been made locally, or by a  
>> collaborator at another institute.
>
> Right, but if a feature is the result of you running some algorithm  
> against some sequences, then it's not been downloaded or given to  
> you. Features on one and the same sequence can have different  
> sources, obviously, so I'm a bit confused - I think we're talking  
> about the same thing in different words, but I'm not sure.

Probably. :)

Case study: I download some seqs from Genbank. (Which then need to be  
annotated as having come from Genbank, at the sequence level). They  
already have some features on them (which need to be annotated as  
having come from Genbank, at the feature level, but of an unknown  
algorithm as Genbank doesn't specify how they were generated usually).  
I then run BLAST of those sequences against some local data, and  
record my own features as a result. I also run BLAT, and again record  
my own features. My colleague also runs BLAST of the same seqs against  
some data of his own, and wants our combined feature results to be  
stored in the same database. I want to be able to annotate all these  
new features both with the algorithm used to generate them (BLAST or  
BLAT) and who did it (myself or my colleague at the institute down the  
road), in addition to retaining the original features that came from  
Genbank (and making sure they're annotated as such). Hence I'd need a  
source attribute for the sequence (Genbank in this case), a source  
attribute for each feature (Genbank, Me, or Colleague X, in this  
case), and an algorithm/technique/protocol attribute for each feature  
(BLAST or BLAT or 'don't know it just came from Genbank' in this  
example).

cheers,
Richard

> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From hlapp at gmx.net  Sat Aug 15 19:31:13 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 15 Aug 2009 15:31:13 -0400
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
	<30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
	<1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>
Message-ID: <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net>


On Aug 15, 2009, at 12:32 PM, Richard Holland wrote:

> [...]
> Case study:

Great, now we're getting somewhere :-)

> I download some seqs from Genbank. (Which then need to be annotated  
> as having come from Genbank, at the sequence level).

Note, as you say, *at the sequence level*. I.e., you would record this  
either using the bioentry's namespace (biodatabase), or a  
bioentry_qualifier_value annotation. I would choose the former, though  
since a bioentry can on only be in one namespace, it may not satisfy  
your needs.

> They already have some features on them (which need to be annotated  
> as having come from Genbank, at the feature level, but of an unknown  
> algorithm as Genbank doesn't specify how they were generated usually).

Right. The source term would indicate that GenBank provided them to  
you, and that that's all you know.

> I then run BLAST of those sequences against some local data, and  
> record my own features as a result. I also run BLAT, and again  
> record my own features.

BLAST and BLAT would now be the source terms.

> My colleague also runs BLAST of the same seqs against some data of  
> his own, and wants our combined feature results to be stored in the  
> same database. I want to be able to annotate all these new features  
> both with the algorithm used to generate them (BLAST or BLAT)

You use the source term for that.

> and who did it (myself or my colleague at the institute down the road)

Ah - that's provenance information, not the source as is normally  
referred to. BioSQL at present doesn't have an explicit provenance  
model, but you can still record provenance information through  
ontology-typed tag/value annotation in seqfeature_qualifier_value,  
with the terms coming from a provenance ontology (that you make up  
yourself or grab from somewhere else).

> , in addition to retaining the original features that came from  
> Genbank (and making sure they're annotated as such).

That shouldn't be a problem - certainly it's not for BioSQL.

> Hence I'd need a source attribute for the sequence (Genbank in this  
> case), a source attribute for each feature (Genbank, Me, or  
> Colleague X, in this case), and an algorithm/technique/protocol  
> attribute for each feature (BLAST or BLAT or 'don't know it just  
> came from Genbank' in this example).

Not quite - source really is what provided the feature to you, not who  
or when, or using which BLAST database, genome assembly, or how you  
parsed the results, etc etc. That's all provenance information.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From holland at eaglegenomics.com  Sat Aug 15 20:00:39 2009
From: holland at eaglegenomics.com (Richard Holland)
Date: Sat, 15 Aug 2009 21:00:39 +0100
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
	<30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
	<1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>
	<82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net>
Message-ID: <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com>

Ok, cool. So we can now rephrase the original question to...: How  
should provenance information be stored in BioSQL?

:)

cheers,
Richard

On 15 Aug 2009, at 20:31, Hilmar Lapp wrote:

>
> On Aug 15, 2009, at 12:32 PM, Richard Holland wrote:
>
>> [...]
>> Case study:
>
> Great, now we're getting somewhere :-)
>
>> I download some seqs from Genbank. (Which then need to be annotated  
>> as having come from Genbank, at the sequence level).
>
> Note, as you say, *at the sequence level*. I.e., you would record  
> this either using the bioentry's namespace (biodatabase), or a  
> bioentry_qualifier_value annotation. I would choose the former,  
> though since a bioentry can on only be in one namespace, it may not  
> satisfy your needs.
>
>> They already have some features on them (which need to be annotated  
>> as having come from Genbank, at the feature level, but of an  
>> unknown algorithm as Genbank doesn't specify how they were  
>> generated usually).
>
> Right. The source term would indicate that GenBank provided them to  
> you, and that that's all you know.
>
>> I then run BLAST of those sequences against some local data, and  
>> record my own features as a result. I also run BLAT, and again  
>> record my own features.
>
> BLAST and BLAT would now be the source terms.
>
>> My colleague also runs BLAST of the same seqs against some data of  
>> his own, and wants our combined feature results to be stored in the  
>> same database. I want to be able to annotate all these new features  
>> both with the algorithm used to generate them (BLAST or BLAT)
>
> You use the source term for that.
>
>> and who did it (myself or my colleague at the institute down the  
>> road)
>
> Ah - that's provenance information, not the source as is normally  
> referred to. BioSQL at present doesn't have an explicit provenance  
> model, but you can still record provenance information through  
> ontology-typed tag/value annotation in seqfeature_qualifier_value,  
> with the terms coming from a provenance ontology (that you make up  
> yourself or grab from somewhere else).
>
>> , in addition to retaining the original features that came from  
>> Genbank (and making sure they're annotated as such).
>
> That shouldn't be a problem - certainly it's not for BioSQL.
>
>> Hence I'd need a source attribute for the sequence (Genbank in this  
>> case), a source attribute for each feature (Genbank, Me, or  
>> Colleague X, in this case), and an algorithm/technique/protocol  
>> attribute for each feature (BLAST or BLAT or 'don't know it just  
>> came from Genbank' in this example).
>
> Not quite - source really is what provided the feature to you, not  
> who or when, or using which BLAST database, genome assembly, or how  
> you parsed the results, etc etc. That's all provenance information.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From hlapp at gmx.net  Sat Aug 15 20:14:54 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 15 Aug 2009 16:14:54 -0400
Subject: [BioSQL-l] What should source_term_id in table seqfeature refer
	to?
In-Reply-To: <5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com>
References: <200908111010.12143.florian.mittag@uni-tuebingen.de>
	<57E7DDC3-050F-48F0-8755-342BC01EF426@eaglegenomics.com>
	<200908111109.50361.florian.mittag@uni-tuebingen.de>
	<789D850E-E219-44BC-B144-BF3B7D177BDE@eaglegenomics.com>
	<752A15DF-616A-466F-8506-02EF9ED9F1E4@gmx.net>
	<03E617DA-BA03-4F99-85A7-E9D23163DF36@eaglegenomics.com>
	<30B45DA8-AE8B-4BAF-9314-CE3B7D828F55@gmx.net>
	<1A91C34B-D61B-4152-A00E-9ADC61A764AD@eaglegenomics.com>
	<82601036-CB5E-4DD6-9AFF-DECA54F5A067@gmx.net>
	<5C474FE2-969A-4B8A-8B4B-1257107A5FD7@eaglegenomics.com>
Message-ID: <92DD5E74-5638-4CB8-B34A-3282AACF036A@gmx.net>


On Aug 15, 2009, at 4:00 PM, Richard Holland wrote:

> Ok, cool. So we can now rephrase the original question to...: How  
> should provenance information be stored in BioSQL?


Yes, and the answer is using a provenance ontology or controlled  
vocabulary and bioentry_qualifier_value and seqfeature_qualifier_value.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Wed Aug 26 10:53:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 26 Aug 2009 11:53:40 +0100
Subject: [BioSQL-l] Indexing of (seqfeature) locations?
Message-ID: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>

Hi BioSQL folks,

The BioSQL schema includes a few indexes on the location table
(e.g. quoting the MySQL schema, but it looks the same on pg too):

CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
CREATE INDEX seqfeatureloc_dbx   ON location(dbxref_id);
CREATE INDEX seqfeatureloc_trm   ON location(term_id);

Will these facilitate searches like this?:

"SELECT ... WHERE 2000 <= location.start_pos
AND location.end_pos <= 5000 AND ..."

Or, for this would it help to include:

CREATE INDEX seqfeatureloc_start ON location(start_pos);
CREATE INDEX seqfeatureloc_start ON location(end_pos);

A motivational use case would be to pull out an operon, or a
region of a record as part of a genome browser.

Thanks,

Peter


From hlapp at gmx.net  Wed Aug 26 12:07:08 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 26 Aug 2009 08:07:08 -0400
Subject: [BioSQL-l] Indexing of (seqfeature) locations?
In-Reply-To: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>
References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>
Message-ID: <B7C53E8C-34DC-44D5-9322-8A8C690F202A@gmx.net>


On Aug 26, 2009, at 6:53 AM, Peter wrote:

> The BioSQL schema includes a few indexes on the location table
> (e.g. quoting the MySQL schema, but it looks the same on pg too):
>
> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
> [...]
> Will these facilitate searches like this?:
>
> "SELECT ... WHERE 2000 <= location.start_pos
> AND location.end_pos <= 5000 AND ..."
>
> Or, for this would it help to include:
>
> CREATE INDEX seqfeatureloc_start ON location(start_pos);
> CREATE INDEX seqfeatureloc_start ON location(end_pos);

With a decent RDBMS, having two indexes instead of a compound one will  
slow this query down. What the compound one won't help you with is if  
your query doesn't constrain the leading columns. For example, a  
compound index on (start_pos,end_pos) won't be used if you only  
constrain end_pos. If you want to do that, you need on index on  
(end_pos) too.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Wed Aug 26 12:29:56 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 26 Aug 2009 13:29:56 +0100
Subject: [BioSQL-l] Indexing of (seqfeature) locations?
In-Reply-To: <B7C53E8C-34DC-44D5-9322-8A8C690F202A@gmx.net>
References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>
	<B7C53E8C-34DC-44D5-9322-8A8C690F202A@gmx.net>
Message-ID: <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com>

On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp<hlapp at gmx.net> wrote:
>
>
> On Aug 26, 2009, at 6:53 AM, Peter wrote:
>
>> The BioSQL schema includes a few indexes on the location table
>> (e.g. quoting the MySQL schema, but it looks the same on pg too):
>>
>> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
>> [...]
>> Will these facilitate searches like this?:
>>
>> "SELECT ... WHERE 2000 <= location.start_pos
>> AND location.end_pos <= 5000 AND ..."
>>
>> Or, for this would it help to include:
>>
>> CREATE INDEX seqfeatureloc_start ON location(start_pos);
>> CREATE INDEX seqfeatureloc_start ON location(end_pos);
>
> With a decent RDBMS, having two indexes instead of a compound one will slow
> this query down. What the compound one won't help you with is if your query
> doesn't constrain the leading columns. For example, a compound index on
> (start_pos,end_pos) won't be used if you only constrain end_pos. If you want
> to do that, you need on index on (end_pos) too.

Thanks for your reply Hilmar. Just to make sure I understood, the current
BioSQL indexes are fine for this:

 "SELECT ... WHERE 2000 <= location.start_pos
AND location.end_pos <= 5000 AND ..."

but not so great for:

 "SELECT ... WHERE 2000 <= location.start_pos AND ..."

or,

 "SELECT ... WHERE location.end_pos <= 5000 AND ..."

Nevertheless, that should cover most usage.

Having just two separated indexes on start_pos and end_pos would
speed up queries on just start or end, but would slow down queries
using both.

Presumably having three indexes as follows would cover all these
examples efficiently, but at the cost of two more indexes?:

CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
CREATE INDEX seqfeatureloc_start ON location(start_pos);
CREATE INDEX seqfeatureloc_start ON location(end_pos);

If that is all accurate, the status quo is fine :)

Regards,

Peter


From haili at mpiz-koeln.mpg.de  Wed Aug 26 14:18:09 2009
From: haili at mpiz-koeln.mpg.de (Song Haili)
Date: Wed, 26 Aug 2009 16:18:09 +0200
Subject: [BioSQL-l] error with load_ontology
Message-ID: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>

Hi All,
I encountered an error message when using load_ontology.pl to load gene ontology into biosql database. The command used is: 

perl load_ontology.pl --driver Pg --host pg-server --dbname dbname --dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --format obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete. 

At the beginning,? data can be loaded with warnings, but late an exception occurred and the loading was terminated. Waring and error messages? shown below:

?--------------------- WARNING ---------------------MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS RELATED EC:2.4.1.212) (FK 20447 to Bio::Ontology::OBOterm):ERROR:? current transaction is aborted, commands ignored until end of transaction block---------------------------------------------------Could not store term GO:0050501, name 'hyaluronan synthase activity':------------- EXCEPTION -------------MSG: error while executing statement in Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:? current transaction is aborted, commands ignored until end of transaction blockSTACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK (eval) load_ontology.pl:812STACK main::persist_term load_ontology.pl:794STACK toplevel load_ontology.pl:617-------------------------------------
Can you please help me to solve this problem out? Thank you very much.
Best regards,
song


From hlapp at gmx.net  Wed Aug 26 15:50:35 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 26 Aug 2009 11:50:35 -0400
Subject: [BioSQL-l] error with load_ontology
In-Reply-To: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
References: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
Message-ID: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>

Song,

there should have been an error or warning that immediately preceded  
this errors. It is that one that's the root cause.

Also, are you using by any chance the BioSQL version for PostgreSQL  
that has the RULEs removed? If yes, then at this point you cannot use  
any Bioperl-db scripts (or code) with it, unless you install the rules  
before you run such a script (and presumably remove them again  
afterwards).

	-hilmar

On Aug 26, 2009, at 10:18 AM, Song Haili wrote:

> Hi All,
> I encountered an error message when using load_ontology.pl to load  
> gene ontology into biosql database. The command used is:
>
> perl load_ontology.pl --driver Pg --host pg-server --dbname dbname -- 
> dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --format  
> obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete.
>
> At the beginning,  data can be loaded with warnings, but late an  
> exception occurred and the loading was terminated. Waring and error  
> messages  shown below:
>
>  --------------------- WARNING ---------------------MSG: failed to  
> store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS  
> RELATED EC:2.4.1.212) (FK 20447 to Bio::Ontology::OBOterm):ERROR:   
> current transaction is aborted, commands ignored until end of  
> transaction block--------------------------------------------------- 
> Could not store term GO:0050501, name 'hyaluronan synthase  
> activity':------------- EXCEPTION -------------MSG: error while  
> executing statement in  
> Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:  current  
> transaction is aborted, commands ignored until end of transaction  
> blockSTACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/ 
> lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: 
> 970STACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/ 
> lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: 
> 873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/ 
> site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK  
> Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/ 
> 5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/ 
> 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK  
> Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/ 
> 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK  
> Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/ 
> 5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK (eval)  
> load_ontology.pl:812STACK main::persist_term load_ontology.pl: 
> 794STACK toplevel load_ontology.pl: 
> 617-------------------------------------
> Can you please help me to solve this problem out? Thank you very much.
> Best regards,
> song
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Aug 26 15:56:25 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 26 Aug 2009 11:56:25 -0400
Subject: [BioSQL-l] Indexing of (seqfeature) locations?
In-Reply-To: <320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com>
References: <320fb6e00908260353g1932f321i3d6d5bdc98b221cf@mail.gmail.com>
	<B7C53E8C-34DC-44D5-9322-8A8C690F202A@gmx.net>
	<320fb6e00908260529h76c39a25pca5e3e86f8a16992@mail.gmail.com>
Message-ID: <48B04E04-8561-45EB-9C64-8011665A74A2@gmx.net>


On Aug 26, 2009, at 8:29 AM, Peter wrote:

> On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp<hlapp at gmx.net> wrote:
>>
>>
>> On Aug 26, 2009, at 6:53 AM, Peter wrote:
>>
>>> The BioSQL schema includes a few indexes on the location table
>>> (e.g. quoting the MySQL schema, but it looks the same on pg too):
>>>
>>> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
>>> [...]
>>> Will these facilitate searches like this?:
>>>
>>> "SELECT ... WHERE 2000 <= location.start_pos
>>> AND location.end_pos <= 5000 AND ..."
>>>
>>> Or, for this would it help to include:
>>>
>>> CREATE INDEX seqfeatureloc_start ON location(start_pos);
>>> CREATE INDEX seqfeatureloc_start ON location(end_pos);
>>
>> With a decent RDBMS, having two indexes instead of a compound one  
>> will slow
>> this query down. What the compound one won't help you with is if  
>> your query
>> doesn't constrain the leading columns. For example, a compound  
>> index on
>> (start_pos,end_pos) won't be used if you only constrain end_pos. If  
>> you want
>> to do that, you need on index on (end_pos) too.
>
> Thanks for your reply Hilmar. Just to make sure I understood, the  
> current
> BioSQL indexes are fine for this:
>
> "SELECT ... WHERE 2000 <= location.start_pos
> AND location.end_pos <= 5000 AND ..."
>
> but not so great for:
>
> "SELECT ... WHERE 2000 <= location.start_pos AND ..."

No, this one will work fine. (provided that start_pos comes first in  
the index)

>
> or,
>
> "SELECT ... WHERE location.end_pos <= 5000 AND ..."

Yes.

> [...]
> Having just two separated indexes on start_pos and end_pos would
> speed up queries on just start or end, but would slow down queries
> using both.

Yes (though not necessarily much), and occupy more space.

>
> Presumably having three indexes as follows would cover all these
> examples efficiently, but at the cost of two more indexes?:
>
> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
> CREATE INDEX seqfeatureloc_start ON location(start_pos);
> CREATE INDEX seqfeatureloc_start ON location(end_pos);

With this set, the waste of space for the compound index probably far  
outweighs the performance gain you might see from it. If I need to be  
able to constrain by both independently, I create a compound index,  
and separate indexes for each column after the first in the index.  
I.e., for the purposes of querying by start_pos,

CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
CREATE INDEX seqfeatureloc_start ON location(start_pos);

are redundant.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From haili at mpiz-koeln.mpg.de  Thu Aug 27 07:51:07 2009
From: haili at mpiz-koeln.mpg.de (Song Haili)
Date: Thu, 27 Aug 2009 09:51:07 +0200
Subject: [BioSQL-l] error with load_ontology
In-Reply-To: <78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>
References: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
	<78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>
Message-ID: <fc26d0704793.4a96570b@mpiz-koeln.mpg.de>

Hi Hilmar,

I loaded the data again and found that the biological process GO terms were loaded, although with some warnings:
--------------------- WARNING ---------------------
MSG: DBLink exists in the dblink of _default
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: DBLink exists in the dblink of _default
---------------------------------------------------

But when starting to load molecular function GO terms, process terminated with the following warnings and error message.

??????? Done with biological_process.
Loading ontology molecular_function:
??????? ... terms

--------------------- WARNING ---------------------
MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
ERROR:? value too long for type character varying(255)
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (HAS activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
ERROR:? current transaction is aborted, commands ignored until end of transaction block
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (seHAS RELATED EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
ERROR:? current transaction is aborted, commands ignored until end of transaction block
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (spHAS RELATED EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
ERROR:? current transaction is aborted, commands ignored until end of transaction block
---------------------------------------------------
Could not store term GO:0050501, name 'hyaluronan synthase activity':

------------- EXCEPTION -------------
MSG: error while executing statement in Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: ERROR:? current transaction is aborted, commands ignored until end of transaction block
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:970
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:873
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195
STACK Bio::DB::BioSQL::TermAdaptor::store_children /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /perl/lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264
STACK Bio::DB::Persistent::PersistentObject::store /perl/lib/site_perl/5.10.0/Bio/DB/Persistent/PersistentObject.pm:284
STACK (eval) load_ontology.pl:812
STACK main::persist_term load_ontology.pl:794
STACK toplevel load_ontology.pl:617
-------------------------------------
?at load_ontology.pl line 824
??????? main::persist_term('-term', 'Bio::Ontology::OBOterm=HASH(0x96699d0)', '-db', 'Bio::DB::BioSQL::DBAdaptor=HASH(0xd90620)', '-termfactory', undef, '-throw', 'CODE(0x76ab60)', '-mergeobs', ...) called at load_ontology.pl line 617

I am using biosql-1.0.0 downloaded directly from http://www.biosql.org/wiki/Downloads without any changes. So I am not sure if the RULEs have been removed. By the way, before I met the above error, I was able to use the script load_seqdatabase.pl to load swissprot data with many warnings.

song


----- Original Message -----
From: Hilmar Lapp <hlapp at gmx.net>
Date: Wednesday, August 26, 2009 17:50
Subject: Re: [BioSQL-l] error with load_ontology
To: Song Haili <haili at mpiz-koeln.mpg.de>
Cc: biosql-l at lists.open-bio.org

> Song,
> 
> there should have been an error or warning that immediately 
> preceded? 
> this errors. It is that one that's the root cause.
> 
> Also, are you using by any chance the BioSQL version for 
> PostgreSQL? 
> that has the RULEs removed? If yes, then at this point you 
> cannot use? 
> any Bioperl-db scripts (or code) with it, unless you install the 
> rules? 
> before you run such a script (and presumably remove them 
> again? 
> afterwards).
> 
> 	-hilmar
> 
> On Aug 26, 2009, at 10:18 AM, Song Haili wrote:
> 
> > Hi All,
> > I encountered an error message when using load_ontology.pl to 
> load? 
> > gene ontology into biosql database. The command used is:
> >
> > perl load_ontology.pl --driver Pg --host pg-server --dbname 
> dbname -- 
> > dbuser dbsuer --dbpass dbpass --namespace "Gene Ontology" --
> format? 
> > obo /home/data/haili_biosql/GO/gene_ontology.1_2.obo --noobsolete.
> >
> > At the beginning,? data can be loaded with warnings, but 
> late an? 
> > exception occurred and the loading was terminated. Waring and 
> error? 
> > messages? shown below:
> >
> >? --------------------- WARNING ---------------------MSG: 
> failed to? 
> > store term synonym (Bio::DB::BioSQL::TermAdaptor) with values 
> (spHAS? 
> > RELATED EC:2.4.1.212) (FK 20447 to 
> Bio::Ontology::OBOterm):ERROR:?? 
> > current transaction is aborted, commands ignored until end 
> of? 
> > transaction block----------------------------------------------
> ----- 
> > Could not store term GO:0050501, name 'hyaluronan 
> synthase? 
> > activity':------------- EXCEPTION -------------MSG: error 
> while? 
> > executing statement in? 
> > Bio::DB::BioSQL::DBLinkAdaptor::find_by_unique_key: 
> ERROR:? current? 
> > transaction is aborted, commands ignored until end of 
> transaction? 
> > blockSTACK? 
> > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key 
> /perl/ 
> > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: 
> > 970STACK? 
> > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key 
> /perl/ 
> > lib/site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm: 
> > 873STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
> /perl/lib/ 
> > 
> site_perl/5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:195STACK? 
> > Bio::DB::BioSQL::TermAdaptor::store_children 
> /perl/lib/site_perl/ 
> > 5.10.0/Bio/DB/BioSQL/TermAdaptor.pm:306STACK? 
> > Bio::DB::BioSQL::BasePersistenceAdaptor::create 
> /perl/lib/site_perl/ 
> > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:227STACK? 
> > Bio::DB::BioSQL::BasePersistenceAdaptor::store 
> /perl/lib/site_perl/ 
> > 5.10.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:264STACK? 
> > Bio::DB::Persistent::PersistentObject::store 
> /perl/lib/site_perl/ 
> > 5.10.0/Bio/DB/Persistent/PersistentObject.pm:284STACK 
> (eval)? 
> > load_ontology.pl:812STACK main::persist_term load_ontology.pl: 
> > 794STACK toplevel load_ontology.pl: 
> > 617-------------------------------------
> > Can you please help me to solve this problem out? Thank you 
> very much.
> > Best regards,
> > song
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> -- 
> ===========================================================
> : Hilmar Lapp? -:-? Durham, NC? -:-? hlapp 
> at gmx dot net :
> ===========================================================
> 
> 
>


From biopython at maubp.freeserve.co.uk  Thu Aug 27 10:24:23 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 27 Aug 2009 11:24:23 +0100
Subject: [BioSQL-l] error with load_ontology
In-Reply-To: <fc26d0704793.4a96570b@mpiz-koeln.mpg.de>
References: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
	<78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>
	<fc26d0704793.4a96570b@mpiz-koeln.mpg.de>
Message-ID: <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com>

On Thu, Aug 27, 2009 at 8:51 AM, Song Haili wrote:

> --------------------- WARNING ---------------------
> MSG: failed to store term synonym (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
> ERROR:? value too long for type character varying(255)
> ---------------------------------------------------

Extending the relevant field in the schema might be one solution...

> I am using biosql-1.0.0 downloaded directly from
> http://www.biosql.org/wiki/Downloads without any changes.
> So I am not sure if the RULEs have been removed. By the
> way, before I met the above error, I was able to use the script
> load_seqdatabase.pl to load swissprot data with many warnings.

BioSQL 1.0.0 is out of date, the latest release is 1.0.1
Was that a typo?

Peter


From haili at mpiz-koeln.mpg.de  Thu Aug 27 14:55:12 2009
From: haili at mpiz-koeln.mpg.de (Song Haili)
Date: Thu, 27 Aug 2009 16:55:12 +0200
Subject: [BioSQL-l] error with load_ontology
In-Reply-To: <320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com>
References: <fc1dfce26079.4a956041@mpiz-koeln.mpg.de>
	<78F20C39-6169-4144-BE10-E8DFA8D72D2E@gmx.net>
	<fc26d0704793.4a96570b@mpiz-koeln.mpg.de>
	<320fb6e00908270324q1ab69624h47ff0adb41ec0288@mail.gmail.com>
Message-ID: <fc18b40150b.4a96ba70@mpiz-koeln.mpg.de>

Problem solved!
If the file type of synonym of the table of term_synonym is changed from varchar(255) to text, there is no error occurred anymore. However this only works for biosql-1.0.0 (maybe it also works for the latest version biosql-1.0.1, but I didn't do many test). 
Thank you all for your help.
song

----- Original Message -----
From: Peter <biopython at maubp.freeserve.co.uk>
Date: Thursday, August 27, 2009 12:24
Subject: Re: [BioSQL-l] error with load_ontology
To: Song Haili <haili at mpiz-koeln.mpg.de>
Cc: Hilmar Lapp <hlapp at gmx.net>, biosql-l at lists.open-bio.org

> On Thu, Aug 27, 2009 at 8:51 AM, Song Haili wrote:
> 
> > --------------------- WARNING ---------------------
> > MSG: failed to store term synonym 
> (Bio::DB::BioSQL::TermAdaptor) with values (alternating UDP-
> alpha-N-acetyl-D-glucosamine:beta-D-glucuronosyl-(1->3)-nascent 
> hyaluronan 4-N-acetyl-beta-D-glucosaminyltransferase and UDP-
> alpha-D-glucuronate:N-acetyl-beta-D-glucosaminyl-(1->4)-nascent 
> hyaluronan 3-beta-D-glucuronosyltransferase activity EXACT 
> EC:2.4.1.212) (FK 20401 to Bio::Ontology::OBOterm):
> > ERROR:? value too long for type character varying(255)
> > ---------------------------------------------------
> 
> Extending the relevant field in the schema might be one solution...
> 
> > I am using biosql-1.0.0 downloaded directly from
> > http://www.biosql.org/wiki/Downloads without any changes.
> > So I am not sure if the RULEs have been removed. By the
> > way, before I met the above error, I was able to use the script
> > load_seqdatabase.pl to load swissprot data with many warnings.
> 
> BioSQL 1.0.0 is out of date, the latest release is 1.0.1
> Was that a typo?
> 
> Peter