From hlapp at gmx.net  Wed Sep  3 04:43:30 2003
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed Sep  3 08:27:43 2003
Subject: [BioSQL-l] Re: A problem in using load_seqdatabase.pl
In-Reply-To: <2F146949A49BB34DADB689AC286698FE40A118@exchange2k.vitagenomics.com>
Message-ID: <B2032500-DDEA-11D7-A9FB-000A959EB4C4@gmx.net>

Dennis,

which version of biosql do you use? Could you please post the entire 
error message. After how many entries does that happen? Is it 
reproducible? Would it always hit the same entry?

	-hilmar

BTW you should always post to the mailing list(s), biosql-l or 
bioperl-l in this case. Otherwise you may not reach the right people, 
or nor the right email addresses.

On Thursday, August 28, 2003, at 12:12  AM, Dennis Chen wrote:

> Dear sir,
> ?
> ?
> I am an user in using load_seqdatabase.pl you released before.? I 
> tried several ways to figure out a?problem in using the script, but I 
> still can not run it appropriately.? I tried to load 
> decompressed?SWISSPROT data (Release 41.13 of 21-Jun-2003, 
> "sprot.dat") into ORACLE database Server (ver 9.2.0.1.0 in Linux 9.0) 
> by using the load_seqdatabase.pl script. Hardware environment: 2 AMD 
> 2000+ CPU, 4 GB RAM, 80 GB HDD, Virtual swap 6G space.? Perl v.5.8, 
> BioPerl v.1.22, DBI v.1.37, DBD::Oracle v.1.14 and newest bioperl-db 
> nodule were installed in system.? In addition, I made some 
> modification in the load_seqdatabase.pl for parameter setting:
> my $remove_flag = 1;
> my $lookup_flag = 0;
> my $no_update_flag = 0;
> my $safe_flag = 0;
> ?
> Other parameters would set as default. Then I?run the script as: perl 
> load_seqdatabase.pl sorot.dat.
> ?
> Howerer, I always got the warning of "Out of memory..."
> Could you please give me some advise to overcome this trouble?? Thank 
> you very much.
> ?
> Best regards,
> ?
> ?
> Dennis 08/29/2003
> ?
> ?
>
> __________________________________________________
> Bioinformatics
>
> Dennis,?Kuang-DenChen
>
> Research Scientist
>
> Tel: +886-2-8976-9123 ext.7703
>
> Fax: +886-2-8976-9523
>
> Mobile: +886-916-992-455
>
> mailto:dennis.chen@vitagenomics.com
>
>
>
<image.tiff>
>
> CONFIDENTIALITY NOTICE:The contents of this e-mail contain 
> confidential information belonging to the sender, which may be legally 
> privileged information. This information is intended only for the use 
> of the individual, entity or intended recipient addressed above. If 
> you are not the intended recipient, or an employee or agent 
> responsible for delivering it to the intended recipient, you are 
> hereby notified that any disclosure, copying, distribution, or the 
> taking of any action in reliance on the contents of the E-mail or 
> attached files is strictly prohibited. Any review or distribution by 
> others is strictly prohibited. If you are not the intended recipient, 
> please contact the sender and delete all copies.
>
> ?
>
> ?
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3336 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biosql-l/attachments/20030903/dda12965/attachment.bin
From robert.roth at home.se  Fri Sep 12 04:47:03 2003
From: robert.roth at home.se (Robert Roth)
Date: Fri Sep 12 04:47:09 2003
Subject: [BioSQL-l] Problem with example in python_biosql_basic.txt
Message-ID: <1063356423.46604c80robert.roth@home.se>


Hi,

I am completly new to Biopython and BioSQL so my problems might arise from something trivial that I have missed. After installing MySQL, Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the database and everything is in place. The machine is running WinXP and python 2.3.

But when I try to follow the simple example in the documentation for using Biopython with BioSQL that is described in 
biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see below).

-----
>>> import MySQLdb
>>> from BioSQL import BioSeqDatabase
>>> server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "test", passwd = "biopython", host = "localhost", db = "bioseqdb")
>>> db = server.new_database("cold")
>>> from Bio import GenBank
>>> parser = GenBank.FeatureParser()
>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser)
>>> db.load(iterator)

Traceback (most recent call last):
  File "<pyshell#26>", line 1, in -toplevel-
    db.load(iterator)
  File "E:\Python23\lib\site-packages\BioSQL\BioSeqDatabase.py", line 337, in load
    db_loader.load_seqrecord(cur_record)
  File "E:\Python23\lib\site-packages\BioSQL\Loader.py", line 30, in load_seqrecord
    bioentry_id = self._load_bioentry_table(record)
  File "E:\Python23\lib\site-packages\BioSQL\Loader.py", line 173, in _load_bioentry_table
    taxon_id = self._get_taxon_id(record)
  File "E:\Python23\lib\site-packages\BioSQL\Loader.py", line 107, in _get_taxon_id
    taxa = self.adaptor.execute_and_fetchall(sql, (binomial, variant))
  File "E:\Python23\lib\site-packages\BioSQL\BioSeqDatabase.py", line 236, in execute_and_fetchall
    self.cursor.execute(sql, args)
  File "E:\Python23\lib\site-packages\MySQLdb\cursors.py", line 95, in execute
    return self._execute(query, args)
  File "E:\Python23\lib\site-packages\MySQLdb\cursors.py", line 114, in _execute
    self.errorhandler(self, exc, value)
  File "E:\Python23\lib\site-packages\MySQLdb\connections.py", line 33, in defaulterrorhandler
    raise errorclass, errorvalue
OperationalError: (1054, "Unknown column 'binomial' in 'where clause'")

-----

>From loader.py
-----
        if binomial and variant:
            sql = "SELECT taxon_id FROM taxon WHERE binomial = %s" \
                  " AND variant = %s"
            taxa = self.adaptor.execute_and_fetchall(sql, (binomial, variant))
            if taxa:
                return taxa[0][0]
-----

When looking at Loader.py there is a call to MySQL (snippet above). 
But when I look at the ERD for BioSQL I cant find either binomial or variant in the taxon table. Am I completely of here (as I said I'm a complete newbie) or is this the reason its choking?
Any help on what is going wrong would be greatly appreciated.

Thanks in advance,
Robert


From Yves.Bastide at irisa.fr  Fri Sep 12 12:05:21 2003
From: Yves.Bastide at irisa.fr (Yves Bastide)
Date: Fri Sep 12 12:03:48 2003
Subject: [BioSQL-l] Problem with example in python_biosql_basic.txt
In-Reply-To: <1063356423.46604c80robert.roth@home.se>
References: <1063356423.46604c80robert.roth@home.se>
Message-ID: <3F61EEC1.60803@irisa.fr>

Robert Roth wrote:
> Hi,
> 
> I am completly new to Biopython and BioSQL so my problems might arise from something trivial that I have missed. After installing MySQL, Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the database and everything is in place. The machine is running WinXP and python 2.3.
> 
> But when I try to follow the simple example in the documentation for using Biopython with BioSQL that is described in 
> biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see below).
> 
> -----
> 
>>>>import MySQLdb
>>>>from BioSQL import BioSeqDatabase
>>>>server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "test", passwd = "biopython", host = "localhost", db = "bioseqdb")
>>>>db = server.new_database("cold")
>>>>from Bio import GenBank
>>>>parser = GenBank.FeatureParser()
>>>>iterator = GenBank.Iterator(open("cor6_6.gb"), parser)
>>>>db.load(iterator)
> 
> 

[snip]

> 
> When looking at Loader.py there is a call to MySQL (snippet above). 
> But when I look at the ERD for BioSQL I cant find either binomial or variant in the taxon table. Am I completely of here (as I said I'm a complete newbie) or is this the reason its choking?
> Any help on what is going wrong would be greatly appreciated.
> 
> Thanks in advance,

Biopython is still using an old version of the schema.  This should 
change in the not-too-far future...

> Robert

yves

From Jingwei.Ni at celera.com  Sun Sep 14 23:23:50 2003
From: Jingwei.Ni at celera.com (Ni, Jingwei)
Date: Sun Sep 14 23:19:17 2003
Subject: [BioSQL-l] Problem with BioSQL Oracle schema using
	load_seqdatabase.pl
Message-ID: <B97FA25EDA418049A146320ADFE6550623CBFE@celmrkv2.rkv.ad.celera.com>

Hi, I just subscribed to the biosql list. I am testing the Oracle BioSQL
schema using load_seqdatabase.pl. Everything works except when the
sequence size is <=4000, the scripts complains about inconsistent
datatype and the sequence cannot be loaded into the biosequence table,
but all other tables are loaded fine.

Am I doing anything wrong here?

Jingwei

From hlapp at gnf.org  Tue Sep 16 16:51:56 2003
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Sep 16 16:49:59 2003
Subject: [BioSQL-l] Problem with BioSQL Oracle schema using
	load_seqdatabase.pl
In-Reply-To: <B97FA25EDA418049A146320ADFE6550623CBFE@celmrkv2.rkv.ad.celera.com>
Message-ID: <9BF548E2-E887-11D7-9780-000A959EB4C4@gnf.org>

Are you using the latest version of DBD::Oracle and bioperl-db?

I was having the same issue and solved it to the extent that it worked 
for me. I did have to upgrade DBD::Oracle.

I'll check into this once more and put a test into the suite that 
actually tests a large sequence to have this exposed right when you run 
the tests.

	-hilmar

On Sunday, September 14, 2003, at 08:23  PM, Ni, Jingwei wrote:

> Hi, I just subscribed to the biosql list. I am testing the Oracle 
> BioSQL
> schema using load_seqdatabase.pl. Everything works except when the
> sequence size is <=4000, the scripts complains about inconsistent
> datatype and the sequence cannot be loaded into the biosequence table,
> but all other tables are loaded fine.
>
> Am I doing anything wrong here?
>
> Jingwei
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gnf.org  Tue Sep 16 17:01:37 2003
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Sep 16 16:59:41 2003
Subject: [BioSQL-l] Problem with example in python_biosql_basic.txt
In-Reply-To: <3F61EEC1.60803@irisa.fr>
Message-ID: <F6106D23-E888-11D7-9780-000A959EB4C4@gnf.org>

Jeff/Brad or anybody else who can comment: is there anything more 
precise that we can tell people inquiring about biopython supporting 
the singapore version of biosql?

	-hilmar

On Friday, September 12, 2003, at 09:05  AM, Yves Bastide wrote:

> Robert Roth wrote:
>> Hi,
>> I am completly new to Biopython and BioSQL so my problems might arise 
>> from something trivial that I have missed. After installing MySQL, 
>> Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the 
>> database and everything is in place. The machine is running WinXP and 
>> python 2.3.
>> But when I try to follow the simple example in the documentation for 
>> using Biopython with BioSQL that is described in 
>> biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see 
>> below).
>> -----
>>>>> import MySQLdb
>>>>> from BioSQL import BioSeqDatabase
>>>>> server = BioSeqDatabase.open_database(driver = "MySQLdb", user = 
>>>>> "test", passwd = "biopython", host = "localhost", db = "bioseqdb")
>>>>> db = server.new_database("cold")
>>>>> from Bio import GenBank
>>>>> parser = GenBank.FeatureParser()
>>>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser)
>>>>> db.load(iterator)
>
> [snip]
>
>> When looking at Loader.py there is a call to MySQL (snippet above). 
>> But when I look at the ERD for BioSQL I cant find either binomial or 
>> variant in the taxon table. Am I completely of here (as I said I'm a 
>> complete newbie) or is this the reason its choking?
>> Any help on what is going wrong would be greatly appreciated.
>> Thanks in advance,
>
> Biopython is still using an old version of the schema.  This should 
> change in the not-too-far future...
>
>> Robert
>
> yves
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gnf.org  Tue Sep 16 17:15:31 2003
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Sep 16 17:13:37 2003
Subject: [BioSQL-l] Problem with example in python_biosql_basic.txt
In-Reply-To: <3F61EEC1.60803@irisa.fr>
Message-ID: <E76C984D-E88A-11D7-9780-000A959EB4C4@gnf.org>

Jeff/Brad or anybody else who can comment: is there anything more 
precise that we can tell people inquiring about biopython supporting 
the singapore version of biosql?

	-hilmar

On Friday, September 12, 2003, at 09:05  AM, Yves Bastide wrote:

> Robert Roth wrote:
>> Hi,
>> I am completly new to Biopython and BioSQL so my problems might arise 
>> from something trivial that I have missed. After installing MySQL, 
>> Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the 
>> database and everything is in place. The machine is running WinXP and 
>> python 2.3.
>> But when I try to follow the simple example in the documentation for 
>> using Biopython with BioSQL that is described in 
>> biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see 
>> below).
>> -----
>>>>> import MySQLdb
>>>>> from BioSQL import BioSeqDatabase
>>>>> server = BioSeqDatabase.open_database(driver = "MySQLdb", user = 
>>>>> "test", passwd = "biopython", host = "localhost", db = "bioseqdb")
>>>>> db = server.new_database("cold")
>>>>> from Bio import GenBank
>>>>> parser = GenBank.FeatureParser()
>>>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser)
>>>>> db.load(iterator)
>
> [snip]
>
>> When looking at Loader.py there is a call to MySQL (snippet above). 
>> But when I look at the ERD for BioSQL I cant find either binomial or 
>> variant in the taxon table. Am I completely of here (as I said I'm a 
>> complete newbie) or is this the reason its choking?
>> Any help on what is going wrong would be greatly appreciated.
>> Thanks in advance,
>
> Biopython is still using an old version of the schema.  This should 
> change in the not-too-far future...
>
>> Robert
>
> yves
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From chapmanb at uga.edu  Tue Sep 16 17:36:01 2003
From: chapmanb at uga.edu (Brad Chapman)
Date: Tue Sep 16 17:39:31 2003
Subject: [BioPython] Re: [BioSQL-l] Problem with example in
	python_biosql_basic.txt
In-Reply-To: <E76C984D-E88A-11D7-9780-000A959EB4C4@gnf.org>
References: <3F61EEC1.60803@irisa.fr>
	<E76C984D-E88A-11D7-9780-000A959EB4C4@gnf.org>
Message-ID: <20030916213601.GA24804@evostick.agtec.uga.edu>

Hilmar and Robert;

> Jeff/Brad or anybody else who can comment: is there anything more 
> precise that we can tell people inquiring about biopython supporting 
> the singapore version of biosql?

Yves Bastide kindly sent updates to the Biopython BioSQL code to the
dev list last week:

http://www.biopython.org/pipermail/biopython-dev/2003-September/001485.html

This should bring it up to date with the current SQL. I haven't had
a chance to integrate this yet but was hoping to Thursday night.
Hopefully it will then make it into the new release that Jeff has
planned for real-soon-now.

If things need to be up and running sooner then that the SQL that
the Biopython code works with can be found in the Tests/BioSQL
directory.

Sorry to have been slack on this. I have been feelin' really bad
about not having time to get it in, if that is any consolation for
anyone :-). But Thursday, Thursday...

Brad

> On Friday, September 12, 2003, at 09:05  AM, Yves Bastide wrote:
> 
> >Robert Roth wrote:
> >>Hi,
> >>I am completly new to Biopython and BioSQL so my problems might arise 
> >>from something trivial that I have missed. After installing MySQL, 
> >>Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the 
> >>database and everything is in place. The machine is running WinXP and 
> >>python 2.3.
> >>But when I try to follow the simple example in the documentation for 
> >>using Biopython with BioSQL that is described in 
> >>biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see 
> >>below).
> >>-----
> >>>>>import MySQLdb
> >>>>>from BioSQL import BioSeqDatabase
> >>>>>server = BioSeqDatabase.open_database(driver = "MySQLdb", user = 
> >>>>>"test", passwd = "biopython", host = "localhost", db = "bioseqdb")
> >>>>>db = server.new_database("cold")
> >>>>>from Bio import GenBank
> >>>>>parser = GenBank.FeatureParser()
> >>>>>iterator = GenBank.Iterator(open("cor6_6.gb"), parser)
> >>>>>db.load(iterator)
> >
> >[snip]
> >
> >>When looking at Loader.py there is a call to MySQL (snippet above). 
> >>But when I look at the ERD for BioSQL I cant find either binomial or 
> >>variant in the taxon table. Am I completely of here (as I said I'm a 
> >>complete newbie) or is this the reason its choking?
> >>Any help on what is going wrong would be greatly appreciated.
> >>Thanks in advance,
> >
> >Biopython is still using an old version of the schema.  This should 
> >change in the not-too-far future...
> >
> >>Robert
> >
> >yves
> >
> >_______________________________________________
> >BioSQL-l mailing list
> >BioSQL-l@open-bio.org
> >http://open-bio.org/mailman/listinfo/biosql-l
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
From daniel.lang at biologie.uni-freiburg.de  Wed Sep 17 07:03:59 2003
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Wed Sep 17 07:02:19 2003
Subject: [BioSQL-l] gene ontology questions revisited
Message-ID: <200309171304.05382.daniel.lang@biologie.uni-freiburg.de>

Hi,
In june there was a discussion about redundant GO-Terms in GO-flat files and 
the related problems when integrating into the database(see Re: gene ontology 
questions (bug)Tue Jun 3 15:01:54 EDT 2003).
I think I?m confronted with the same problem...
I wanted to load my biosql instantation with the actual go-flat files using 
the load_ontology.pl likes this:

perl ../load_ontology.pl --dbuser biosql --dbpass 'xxx' --dbname bioseqdb 
--driver Pg --namespace "Gene Ontology" --format goflat --fmtargs 
"-defs_file,GO.defs" --testonly function.ontology process.ontology 
component.ontology
Parsing input ...
Loading ontology Gene Ontology:
        ... terms
Could not store GO:0001529 (elastin):

------------- EXCEPTION  -------------
MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by 
unique key
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253
STACK Bio::DB::Persistent::PersistentObject::store 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270
STACK (eval) ../load_ontology.pl:489
STACK toplevel ../load_ontology.pl:471

--------------------------------------

By running safe mode, it is obvious that there are multiple 
erroneous/redundant entries...

I also tried former releases back to 2003-05-01, and encountered the same 
difficulties.

Is this the same problem or am I having other problems?
If not, has anyone contacted the GO people about this issue yet?

Thanks in advance,
Daniel
 
-- 
Daniel Lang
University of Freiburg, Plant Biotechnology
Sonnenstr. 5, D-79104 Freiburg
phone: +49 761 203 6988
homepage:  http://www.plant-biotech.net/
e-mail: daniel.lang@biologie.uni-freiburg.de

#################################################
>REALITY.SYS corrupted: Reboot universe? (Y/N/A)
#################################################


From Raphael.Bauer at informatik.hu-berlin.de  Thu Sep 18 08:36:13 2003
From: Raphael.Bauer at informatik.hu-berlin.de (Raphael A. Bauer)
Date: Thu Sep 18 08:34:17 2003
Subject: [BioSQL-l] Re: gene ontology questions (bug)
Message-ID: <3F69A6BD.2080209@informatik.hu-berlin.de>

Hi...
i've got the same problems as Marc, and i wonder if there is a solution yet.

Command is:
perl load_ontology.pl --host localhost --dbname bioseqdbspgo --dbuser rb 
--driver Pg --namespace "Gene Ontology" --format goflat --fmtargs 
"-defs_file,GO.defs" function.ontology process.ontology component.ontology

Output is:
Parsing input ...
Loading ontology Gene Ontology:
         ... terms
Could not store GO:0001529 (elastin):

------------- EXCEPTION  -------------
MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be 
found by unique key
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store 
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253
STACK Bio::DB::Persistent::PersistentObject::store 
/usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270
STACK (eval) load_ontology.pl:489
STACK toplevel load_ontology.pl:471

--------------------------------------


Quite Strange...
My Bio* things are all the latest releases (BioPerl 1.2.2)
For GO i use the files released September 16, 2003....
..

I think the problem is the Go.defs File:

term: elastin
goid: GO:0001528
definition: OBSOLETE. A major structural protein of mammalian connective 
tissues; composed of one third glycine, and also rich in proline, 
alanine, and valine. Chains are cross-linked together via lysine residues.
definition_reference: ISBN:0198506732
comment: This term was made obsolete because it represents a gene 
product. To update annotations, use the molecular function term 
'extracellular matrix constituent conferring elasticity activity ; 
GO:0030023'.

term: elastin
goid: GO:0001529
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GO:mah
comment: This term was made obsolete because it represents a gene 
product. To update annotations, use the molecular function term 
'extracellular matrix constituent conferring elasticity activity ; 
GO:0030023'.

with two times "elastin".. (it seems that there are many terms that have 
the same term name.. also seen in term collagen and so on...)

and the definition of table term that forbids 2 times the same name(unique):

Indexes: term_pkey primary key btree (term_id),
          term_identifier_key unique btree (identifier),
          term_name_key unique btree (name, ontology_id),
          term_ont btree (ontology_id)

(Marc already mentioned this...)

A dirty workaround would be to rename the term names in GO.defs in case 
there are two identical names (one elastin and the other elastin CHANGED 
or so..)
.. but is there any recommondation on how to handle the problem safely?

Thanks a lot...

Raphael

From hlapp at gnf.org  Thu Sep 18 23:08:58 2003
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu Sep 18 23:06:54 2003
Subject: [BioSQL-l] Re: gene ontology questions (bug)
In-Reply-To: <3F69A6BD.2080209@informatik.hu-berlin.de>
Message-ID: <BB8FC15A.1760%hlapp@gnf.org>

On 9/18/03 5:36 AM, "Raphael A. Bauer"
<Raphael.Bauer@informatik.hu-berlin.de> wrote:

> with two times "elastin".. (it seems that there are many terms that have
> the same term name.. also seen in term collagen and so on...)
> 
> and the definition of table term that forbids 2 times the same name(unique):

Correct. There is a UK constraint on term that a name is to be unique within
an ontology. Terms are also looked up utilizing this constraint.

When you first load on ontology the best strategy is to ignore obsoleted
terms, using the option --noobsolete (check the POD of load_ontology.pl, or
use --help).

The following probably doesn't apply to your use case, but for completeness
let me note that the real problem is when you update an ontology and a term
has been obsoleted because it was merged with another term that then gets
the same name. If you use the otherwise recommendable --updobsolete switch,
the obsoleted term would be properly obsoleted in the database, but
inserting the successor fails with a UK violation. Using --delobsolete would
take care of the problem, but you'd lose annotations to the obsoleted term.
Like it or not, but LL and other DBs do contain GO associations to obsoleted
terms, so just aggressively deleting them yields undesirable effects.

To solve this, I actually resorted to extending the constraint to
(name,ontology_id,is_obsolete) in my Oracle version of biosql. Just
extending the constraint isn't really advisable though, because then the
lookup mechanism in the TermAdaptor needs to be adjusted too. I'll probably
end up doing that.

To get back to your concrete problem though, --noobsolete probably does what
you want. 


    -hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gnf.org  Thu Sep 18 23:22:57 2003
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu Sep 18 23:20:51 2003
Subject: [BioSQL-l] gene ontology questions revisited
In-Reply-To: <200309171304.05382.daniel.lang@biologie.uni-freiburg.de>
Message-ID: <BB8FC4A1.1763%hlapp@gnf.org>

Let me know if the response I just sent for Raphael's posting doesn't answer
or doesn't apply to your problem.

    -hilmar

BTW steht's Atlantic noch? Und's Crash?

On 9/17/03 4:03 AM, "Daniel Lang" <daniel.lang@biologie.uni-freiburg.de>
wrote:

> Hi,
> In june there was a discussion about redundant GO-Terms in GO-flat files and
> the related problems when integrating into the database(see Re: gene ontology
> questions (bug)Tue Jun 3 15:01:54 EDT 2003).
> I think I?m confronted with the same problem...
> I wanted to load my biosql instantation with the actual go-flat files using
> the load_ontology.pl likes this:
> 
> perl ../load_ontology.pl --dbuser biosql --dbpass 'xxx' --dbname bioseqdb
> --driver Pg --namespace "Gene Ontology" --format goflat --fmtargs
> "-defs_file,GO.defs" --testonly function.ontology process.ontology
> component.ontology
> Parsing input ...
> Loading ontology Gene Ontology:
>       ... terms
> Could not store GO:0001529 (elastin):
> 
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by
> unique key
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253
> STACK Bio::DB::Persistent::PersistentObject::store
> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270
> STACK (eval) ../load_ontology.pl:489
> STACK toplevel ../load_ontology.pl:471
> 
> --------------------------------------
> 
> By running safe mode, it is obvious that there are multiple
> erroneous/redundant entries...
> 
> I also tried former releases back to 2003-05-01, and encountered the same
> difficulties.
> 
> Is this the same problem or am I having other problems?
> If not, has anyone contacted the GO people about this issue yet?
> 
> Thanks in advance,
> Daniel
> 

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From daniel.lang at biologie.uni-freiburg.de  Fri Sep 19 08:51:25 2003
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Fri Sep 19 08:49:40 2003
Subject: [BioSQL-l] gene ontology questions revisited
In-Reply-To: <BB8FC4A1.1763%hlapp@gnf.org>
References: <BB8FC4A1.1763%hlapp@gnf.org>
Message-ID: <200309191451.29091.daniel.lang@biologie.uni-freiburg.de>

On Friday 19 September 2003 05:22, you wrote:
> Let me know if the response I just sent for Raphael's posting doesn't
> answer or doesn't apply to your problem.
Problem solved, thanks!
But another one occurred while loading the data:
-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were 
("MetaCyc","2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN","0") FKs ()
ERROR:  value too long for type character varying(40)
---------------------------------------------------
Could not store term relationship (2-pyrone-4,6-dicarboxylate lactonase 
activity,IS_A,carboxylic ester hydrolase activity):

------------- EXCEPTION  -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be found 
by unique key
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207
STACK Bio::DB::BioSQL::TermAdaptor::store_children 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/TermAdaptor.pm:290
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:215
STACK Bio::DB::Persistent::PersistentObject::create 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:243
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:170
STACK Bio::DB::Persistent::PersistentObject::create 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:243
STACK (eval) ../load_ontology.pl:516
STACK toplevel ../load_ontology.pl:515

--------------------------------------

DBD::Pg::st execute failed: ERROR:  value too long for type character 
varying(40) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BaseDriver.pm 
line 1001, <GEN3> line 2377.

Seems that some entries for term.name are longer as the expected 40 chars 
because of a backslash used to escape a comma in the molecule name:(

As I?m not familiar with the GOflat format, I also had a look at the files 
(and those from may) and it seems, that the escaping is always done in this 
field.

A quick?n dirty solution would be to eliminate the backslashes in the files, 
but can I update the database so easily with load_ontology.pl ?

Daniel 

-- 
Daniel Lang
University of Freiburg, Plant Biotechnology
Sonnenstr. 5, D-79104 Freiburg
phone: +49 761 203 6988
homepage:  http://www.plant-biotech.net/
e-mail: daniel.lang@biologie.uni-freiburg.de

#################################################
>REALITY.SYS corrupted: Reboot universe? (Y/N/A)
#################################################


From daniel.lang at biologie.uni-freiburg.de  Fri Sep 19 11:04:02 2003
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Fri Sep 19 11:02:20 2003
Subject: [BioSQL-l] gene ontology questions re-revisited
In-Reply-To: <200309191451.29091.daniel.lang@biologie.uni-freiburg.de>
References: <BB8FC4A1.1763%hlapp@gnf.org>
	<200309191451.29091.daniel.lang@biologie.uni-freiburg.de>
Message-ID: <200309191704.08378.daniel.lang@biologie.uni-freiburg.de>

Uhm,...
> Seems that some entries for term.name are longer as the expected 40 chars
> because of a backslash used to escape a comma in the molecule name:(
The corresponding field should of course be "dbxref.accession"...
But why are they escaping anyway?  
And what to do about it?
Thanks in advance,
Daniel

-- 
Daniel Lang
University of Freiburg, Plant Biotechnology
Sonnenstr. 5, D-79104 Freiburg
phone: +49 761 203 6988
homepage:  http://www.plant-biotech.net/
e-mail: daniel.lang@biologie.uni-freiburg.de

#################################################
>REALITY.SYS corrupted: Reboot universe? (Y/N/A)
#################################################

From hlapp at gnf.org  Fri Sep 19 13:37:28 2003
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Sep 19 13:35:28 2003
Subject: [BioSQL-l] gene ontology questions revisited
In-Reply-To: <200309191451.29091.daniel.lang@biologie.uni-freiburg.de>
Message-ID: <BB908CE8.177D%hlapp@gnf.org>

On 9/19/03 5:51 AM, "Daniel Lang" <daniel.lang@biologie.uni-freiburg.de>
wrote:

> But another one occurred while loading the data:
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were
> ("MetaCyc","2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN","0") FKs ()
> ERROR:  value too long for type character varying(40)
> ---------------------------------------------------

The problem here is that the references for GO terms are modeled as DBXrefs
with dbname and accession. This sometimes applies quite well, but often the
reference in the GO.defs file is used in a far wider sense. In the example
above for instance, the reference is in fact to a term in another ontology
(MetaCyc), so should be a term relationship rather than a reference.

So, what you're seeing is the result of deficiencies in the flat file
representation (term references can be any of lit.reference, dbxref, and
ontology term) and consequently in the parser (who doesn't try to be smarter
than the flat file representation).

Unfortunately that assessment doesn't help you much. What I did locally (I
obviously ran into the same problem) is widening the accession column in
dbxref to 64 chars, which is I thought a somewhat reasonable compromise. You
don't want to open it up completely and water down the relational model just
because a certain flat file format is deficient in its expressivity). This
doesn't fix the problem that something ends up as a dbxref when it should
rather be a term relationship.

Anyone else got a good idea here? I'm cc'ing the bioperl list since this is
rather an issue of the object-space representation than one of the schema.

    -hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Sun Sep 21 00:41:18 2003
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun Sep 21 00:39:16 2003
Subject: [BioSQL-l] slides of persistent bioperl bosc03 talk
Message-ID: <D74864EA-EBED-11D7-92C5-000A959EB4C4@gmx.net>

I offered the slides a while ago and then got dragged away by other 
things before being able to follow through. I've posted them now:

http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf

I also wrote a news entry which I guess needs a while to propagate.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From Gerben.Menschaert at devgen.com  Wed Sep 24 05:49:05 2003
From: Gerben.Menschaert at devgen.com (Gerben Menschaert)
Date: Wed Sep 24 05:47:08 2003
Subject: [BioSQL-l] error using load_seqdatabase.pl
Message-ID: <BEE28BF86078B6429D6C780635718E2103CA15@morelia.be.devgen.com>

Hello,

I'm trying to load a genbank file into biosql:
perl load_seqdatabase.pl --driver Oracle --dbuser biosql --dbpass biosql --dbname sfr01  --lookup --noupdate --safe  /data/lazy/gbinv1.seq

Every genbank entry load failes with the following error:

DBD::Oracle::db prepare failed: ORA-00918: column ambiguously defined (DBD ERROR: OCIStmtExecute/Describ
e) [for statement ``SELECT taxon_name.tax_oid, NULL, NULL, taxon_name.tax_oid, taxon_name.name, NULL FRO
M taxon, taxon_name WHERE taxon.oid = taxon_name.tax_oid AND name_class = ? AND tax_oid = ?'']) at /usr/
local/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Oracle/SpeciesAdaptorDriver.pm line 214, <GEN0> line 22212
9.
This error is normal since the tax_oid in the where clause is indeed ambiguously defined (it missed the prefix "tax_name.").

I'm running biosql on Oracle, bioperl-1.2.2 is installed and I'm using the main branch of bioperl-db. I recently changed from bioperl-1.2.1 to 1.2.2.

Any ideas?

Gerben

From hlapp at gnf.org  Wed Sep 24 15:34:21 2003
From: hlapp at gnf.org (Hilmar Lapp)
Date: Wed Sep 24 15:32:23 2003
Subject: [BioSQL-l] error using load_seqdatabase.pl
In-Reply-To: <BEE28BF86078B6429D6C780635718E2103CA15@morelia.be.devgen.com>
Message-ID: <BB973FCD.191B%hlapp@gnf.org>

This looks clearly like a bug since the generated statement is incorrect.
What puzzles me is that I update RefSeq (which is in Genbank format) on a
daily basis on an Oracle instance and I'm not seeing this error. Also, I
thought there is a test for genbank. I need to check that there is.

Did you pre-load the NCBI taxon database? If no, consider doing so, as it
will likely spare you from some trouble down the road with species that
aren't parsed correctly by flat file parsers.

    -hilmar
 
On 9/24/03 2:49 AM, "Gerben Menschaert" <Gerben.Menschaert@devgen.com>
wrote:

> Hello,
> 
> I'm trying to load a genbank file into biosql:
> perl load_seqdatabase.pl --driver Oracle --dbuser biosql --dbpass biosql
> --dbname sfr01  --lookup --noupdate --safe  /data/lazy/gbinv1.seq
> 
> Every genbank entry load failes with the following error:
> 
> DBD::Oracle::db prepare failed: ORA-00918: column ambiguously defined (DBD
> ERROR: OCIStmtExecute/Describ
> e) [for statement ``SELECT taxon_name.tax_oid, NULL, NULL, taxon_name.tax_oid,
> taxon_name.name, NULL FRO
> M taxon, taxon_name WHERE taxon.oid = taxon_name.tax_oid AND name_class = ?
> AND tax_oid = ?'']) at /usr/
> local/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Oracle/SpeciesAdaptorDriver.pm
> line 214, <GEN0> line 22212
> 9.
> This error is normal since the tax_oid in the where clause is indeed
> ambiguously defined (it missed the prefix "tax_name.").
> 
> I'm running biosql on Oracle, bioperl-1.2.2 is installed and I'm using the
> main branch of bioperl-db. I recently changed from bioperl-1.2.1 to 1.2.2.
> 
> Any ideas?
> 
> Gerben
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------