From hlapp at gmx.net Wed Sep 3 04:43:30 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Sep 3 08:27:43 2003 Subject: [BioSQL-l] Re: A problem in using load_seqdatabase.pl In-Reply-To: <2F146949A49BB34DADB689AC286698FE40A118@exchange2k.vitagenomics.com> Message-ID: Dennis, which version of biosql do you use? Could you please post the entire error message. After how many entries does that happen? Is it reproducible? Would it always hit the same entry? -hilmar BTW you should always post to the mailing list(s), biosql-l or bioperl-l in this case. Otherwise you may not reach the right people, or nor the right email addresses. On Thursday, August 28, 2003, at 12:12 AM, Dennis Chen wrote: > Dear sir, > ? > ? > I am an user in using load_seqdatabase.pl you released before.? I > tried several ways to figure out a?problem in using the script, but I > still can not run it appropriately.? I tried to load > decompressed?SWISSPROT data (Release 41.13 of 21-Jun-2003, > "sprot.dat") into ORACLE database Server (ver 9.2.0.1.0 in Linux 9.0) > by using the load_seqdatabase.pl script. Hardware environment: 2 AMD > 2000+ CPU, 4 GB RAM, 80 GB HDD, Virtual swap 6G space.? Perl v.5.8, > BioPerl v.1.22, DBI v.1.37, DBD::Oracle v.1.14 and newest bioperl-db > nodule were installed in system.? In addition, I made some > modification in the load_seqdatabase.pl for parameter setting: > my $remove_flag = 1; > my $lookup_flag = 0; > my $no_update_flag = 0; > my $safe_flag = 0; > ? > Other parameters would set as default. Then I?run the script as: perl > load_seqdatabase.pl sorot.dat. > ? > Howerer, I always got the warning of "Out of memory..." > Could you please give me some advise to overcome this trouble?? Thank > you very much. > ? > Best regards, > ? > ? > Dennis 08/29/2003 > ? > ? > > __________________________________________________ > Bioinformatics > > Dennis,?Kuang-DenChen > > Research Scientist > > Tel: +886-2-8976-9123 ext.7703 > > Fax: +886-2-8976-9523 > > Mobile: +886-916-992-455 > > mailto:dennis.chen@vitagenomics.com > > > > > CONFIDENTIALITY NOTICE:The contents of this e-mail contain > confidential information belonging to the sender, which may be legally > privileged information. This information is intended only for the use > of the individual, entity or intended recipient addressed above. If > you are not the intended recipient, or an employee or agent > responsible for delivering it to the intended recipient, you are > hereby notified that any disclosure, copying, distribution, or the > taking of any action in reliance on the contents of the E-mail or > attached files is strictly prohibited. Any review or distribution by > others is strictly prohibited. If you are not the intended recipient, > please contact the sender and delete all copies. > > ? > > ? > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3336 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biosql-l/attachments/20030903/dda12965/attachment.bin From robert.roth at home.se Fri Sep 12 04:47:03 2003 From: robert.roth at home.se (Robert Roth) Date: Fri Sep 12 04:47:09 2003 Subject: [BioSQL-l] Problem with example in python_biosql_basic.txt Message-ID: <1063356423.46604c80robert.roth@home.se> Hi, I am completly new to Biopython and BioSQL so my problems might arise from something trivial that I have missed. After installing MySQL, Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the database and everything is in place. The machine is running WinXP and python 2.3. But when I try to follow the simple example in the documentation for using Biopython with BioSQL that is described in biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see below). ----- >>> import MySQLdb >>> from BioSQL import BioSeqDatabase >>> server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "test", passwd = "biopython", host = "localhost", db = "bioseqdb") >>> db = server.new_database("cold") >>> from Bio import GenBank >>> parser = GenBank.FeatureParser() >>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser) >>> db.load(iterator) Traceback (most recent call last): File "", line 1, in -toplevel- db.load(iterator) File "E:\Python23\lib\site-packages\BioSQL\BioSeqDatabase.py", line 337, in load db_loader.load_seqrecord(cur_record) File "E:\Python23\lib\site-packages\BioSQL\Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "E:\Python23\lib\site-packages\BioSQL\Loader.py", line 173, in _load_bioentry_table taxon_id = self._get_taxon_id(record) File "E:\Python23\lib\site-packages\BioSQL\Loader.py", line 107, in _get_taxon_id taxa = self.adaptor.execute_and_fetchall(sql, (binomial, variant)) File "E:\Python23\lib\site-packages\BioSQL\BioSeqDatabase.py", line 236, in execute_and_fetchall self.cursor.execute(sql, args) File "E:\Python23\lib\site-packages\MySQLdb\cursors.py", line 95, in execute return self._execute(query, args) File "E:\Python23\lib\site-packages\MySQLdb\cursors.py", line 114, in _execute self.errorhandler(self, exc, value) File "E:\Python23\lib\site-packages\MySQLdb\connections.py", line 33, in defaulterrorhandler raise errorclass, errorvalue OperationalError: (1054, "Unknown column 'binomial' in 'where clause'") ----- >From loader.py ----- if binomial and variant: sql = "SELECT taxon_id FROM taxon WHERE binomial = %s" \ " AND variant = %s" taxa = self.adaptor.execute_and_fetchall(sql, (binomial, variant)) if taxa: return taxa[0][0] ----- When looking at Loader.py there is a call to MySQL (snippet above). But when I look at the ERD for BioSQL I cant find either binomial or variant in the taxon table. Am I completely of here (as I said I'm a complete newbie) or is this the reason its choking? Any help on what is going wrong would be greatly appreciated. Thanks in advance, Robert From Yves.Bastide at irisa.fr Fri Sep 12 12:05:21 2003 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Fri Sep 12 12:03:48 2003 Subject: [BioSQL-l] Problem with example in python_biosql_basic.txt In-Reply-To: <1063356423.46604c80robert.roth@home.se> References: <1063356423.46604c80robert.roth@home.se> Message-ID: <3F61EEC1.60803@irisa.fr> Robert Roth wrote: > Hi, > > I am completly new to Biopython and BioSQL so my problems might arise from something trivial that I have missed. After installing MySQL, Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the database and everything is in place. The machine is running WinXP and python 2.3. > > But when I try to follow the simple example in the documentation for using Biopython with BioSQL that is described in > biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see below). > > ----- > >>>>import MySQLdb >>>>from BioSQL import BioSeqDatabase >>>>server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "test", passwd = "biopython", host = "localhost", db = "bioseqdb") >>>>db = server.new_database("cold") >>>>from Bio import GenBank >>>>parser = GenBank.FeatureParser() >>>>iterator = GenBank.Iterator(open("cor6_6.gb"), parser) >>>>db.load(iterator) > > [snip] > > When looking at Loader.py there is a call to MySQL (snippet above). > But when I look at the ERD for BioSQL I cant find either binomial or variant in the taxon table. Am I completely of here (as I said I'm a complete newbie) or is this the reason its choking? > Any help on what is going wrong would be greatly appreciated. > > Thanks in advance, Biopython is still using an old version of the schema. This should change in the not-too-far future... > Robert yves From Jingwei.Ni at celera.com Sun Sep 14 23:23:50 2003 From: Jingwei.Ni at celera.com (Ni, Jingwei) Date: Sun Sep 14 23:19:17 2003 Subject: [BioSQL-l] Problem with BioSQL Oracle schema using load_seqdatabase.pl Message-ID: Hi, I just subscribed to the biosql list. I am testing the Oracle BioSQL schema using load_seqdatabase.pl. Everything works except when the sequence size is <=4000, the scripts complains about inconsistent datatype and the sequence cannot be loaded into the biosequence table, but all other tables are loaded fine. Am I doing anything wrong here? Jingwei From hlapp at gnf.org Tue Sep 16 16:51:56 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Sep 16 16:49:59 2003 Subject: [BioSQL-l] Problem with BioSQL Oracle schema using load_seqdatabase.pl In-Reply-To: Message-ID: <9BF548E2-E887-11D7-9780-000A959EB4C4@gnf.org> Are you using the latest version of DBD::Oracle and bioperl-db? I was having the same issue and solved it to the extent that it worked for me. I did have to upgrade DBD::Oracle. I'll check into this once more and put a test into the suite that actually tests a large sequence to have this exposed right when you run the tests. -hilmar On Sunday, September 14, 2003, at 08:23 PM, Ni, Jingwei wrote: > Hi, I just subscribed to the biosql list. I am testing the Oracle > BioSQL > schema using load_seqdatabase.pl. Everything works except when the > sequence size is <=4000, the scripts complains about inconsistent > datatype and the sequence cannot be loaded into the biosequence table, > but all other tables are loaded fine. > > Am I doing anything wrong here? > > Jingwei > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Sep 16 17:01:37 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Sep 16 16:59:41 2003 Subject: [BioSQL-l] Problem with example in python_biosql_basic.txt In-Reply-To: <3F61EEC1.60803@irisa.fr> Message-ID: Jeff/Brad or anybody else who can comment: is there anything more precise that we can tell people inquiring about biopython supporting the singapore version of biosql? -hilmar On Friday, September 12, 2003, at 09:05 AM, Yves Bastide wrote: > Robert Roth wrote: >> Hi, >> I am completly new to Biopython and BioSQL so my problems might arise >> from something trivial that I have missed. After installing MySQL, >> Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the >> database and everything is in place. The machine is running WinXP and >> python 2.3. >> But when I try to follow the simple example in the documentation for >> using Biopython with BioSQL that is described in >> biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see >> below). >> ----- >>>>> import MySQLdb >>>>> from BioSQL import BioSeqDatabase >>>>> server = BioSeqDatabase.open_database(driver = "MySQLdb", user = >>>>> "test", passwd = "biopython", host = "localhost", db = "bioseqdb") >>>>> db = server.new_database("cold") >>>>> from Bio import GenBank >>>>> parser = GenBank.FeatureParser() >>>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser) >>>>> db.load(iterator) > > [snip] > >> When looking at Loader.py there is a call to MySQL (snippet above). >> But when I look at the ERD for BioSQL I cant find either binomial or >> variant in the taxon table. Am I completely of here (as I said I'm a >> complete newbie) or is this the reason its choking? >> Any help on what is going wrong would be greatly appreciated. >> Thanks in advance, > > Biopython is still using an old version of the schema. This should > change in the not-too-far future... > >> Robert > > yves > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Sep 16 17:15:31 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Sep 16 17:13:37 2003 Subject: [BioSQL-l] Problem with example in python_biosql_basic.txt In-Reply-To: <3F61EEC1.60803@irisa.fr> Message-ID: Jeff/Brad or anybody else who can comment: is there anything more precise that we can tell people inquiring about biopython supporting the singapore version of biosql? -hilmar On Friday, September 12, 2003, at 09:05 AM, Yves Bastide wrote: > Robert Roth wrote: >> Hi, >> I am completly new to Biopython and BioSQL so my problems might arise >> from something trivial that I have missed. After installing MySQL, >> Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the >> database and everything is in place. The machine is running WinXP and >> python 2.3. >> But when I try to follow the simple example in the documentation for >> using Biopython with BioSQL that is described in >> biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see >> below). >> ----- >>>>> import MySQLdb >>>>> from BioSQL import BioSeqDatabase >>>>> server = BioSeqDatabase.open_database(driver = "MySQLdb", user = >>>>> "test", passwd = "biopython", host = "localhost", db = "bioseqdb") >>>>> db = server.new_database("cold") >>>>> from Bio import GenBank >>>>> parser = GenBank.FeatureParser() >>>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser) >>>>> db.load(iterator) > > [snip] > >> When looking at Loader.py there is a call to MySQL (snippet above). >> But when I look at the ERD for BioSQL I cant find either binomial or >> variant in the taxon table. Am I completely of here (as I said I'm a >> complete newbie) or is this the reason its choking? >> Any help on what is going wrong would be greatly appreciated. >> Thanks in advance, > > Biopython is still using an old version of the schema. This should > change in the not-too-far future... > >> Robert > > yves > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From chapmanb at uga.edu Tue Sep 16 17:36:01 2003 From: chapmanb at uga.edu (Brad Chapman) Date: Tue Sep 16 17:39:31 2003 Subject: [BioPython] Re: [BioSQL-l] Problem with example in python_biosql_basic.txt In-Reply-To: References: <3F61EEC1.60803@irisa.fr> Message-ID: <20030916213601.GA24804@evostick.agtec.uga.edu> Hilmar and Robert; > Jeff/Brad or anybody else who can comment: is there anything more > precise that we can tell people inquiring about biopython supporting > the singapore version of biosql? Yves Bastide kindly sent updates to the Biopython BioSQL code to the dev list last week: http://www.biopython.org/pipermail/biopython-dev/2003-September/001485.html This should bring it up to date with the current SQL. I haven't had a chance to integrate this yet but was hoping to Thursday night. Hopefully it will then make it into the new release that Jeff has planned for real-soon-now. If things need to be up and running sooner then that the SQL that the Biopython code works with can be found in the Tests/BioSQL directory. Sorry to have been slack on this. I have been feelin' really bad about not having time to get it in, if that is any consolation for anyone :-). But Thursday, Thursday... Brad > On Friday, September 12, 2003, at 09:05 AM, Yves Bastide wrote: > > >Robert Roth wrote: > >>Hi, > >>I am completly new to Biopython and BioSQL so my problems might arise > >>from something trivial that I have missed. After installing MySQL, > >>Biopython, MySQLdb and BioSQL I loaded the scheme for BioSQL into the > >>database and everything is in place. The machine is running WinXP and > >>python 2.3. > >>But when I try to follow the simple example in the documentation for > >>using Biopython with BioSQL that is described in > >>biosql/biosql-schema/doc/python_biosql_basic.txt it chokes (see > >>below). > >>----- > >>>>>import MySQLdb > >>>>>from BioSQL import BioSeqDatabase > >>>>>server = BioSeqDatabase.open_database(driver = "MySQLdb", user = > >>>>>"test", passwd = "biopython", host = "localhost", db = "bioseqdb") > >>>>>db = server.new_database("cold") > >>>>>from Bio import GenBank > >>>>>parser = GenBank.FeatureParser() > >>>>>iterator = GenBank.Iterator(open("cor6_6.gb"), parser) > >>>>>db.load(iterator) > > > >[snip] > > > >>When looking at Loader.py there is a call to MySQL (snippet above). > >>But when I look at the ERD for BioSQL I cant find either binomial or > >>variant in the taxon table. Am I completely of here (as I said I'm a > >>complete newbie) or is this the reason its choking? > >>Any help on what is going wrong would be greatly appreciated. > >>Thanks in advance, > > > >Biopython is still using an old version of the schema. This should > >change in the not-too-far future... > > > >>Robert > > > >yves > > > >_______________________________________________ > >BioSQL-l mailing list > >BioSQL-l@open-bio.org > >http://open-bio.org/mailman/listinfo/biosql-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From daniel.lang at biologie.uni-freiburg.de Wed Sep 17 07:03:59 2003 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed Sep 17 07:02:19 2003 Subject: [BioSQL-l] gene ontology questions revisited Message-ID: <200309171304.05382.daniel.lang@biologie.uni-freiburg.de> Hi, In june there was a discussion about redundant GO-Terms in GO-flat files and the related problems when integrating into the database(see Re: gene ontology questions (bug)Tue Jun 3 15:01:54 EDT 2003). I think I?m confronted with the same problem... I wanted to load my biosql instantation with the actual go-flat files using the load_ontology.pl likes this: perl ../load_ontology.pl --dbuser biosql --dbpass 'xxx' --dbname bioseqdb --driver Pg --namespace "Gene Ontology" --format goflat --fmtargs "-defs_file,GO.defs" --testonly function.ontology process.ontology component.ontology Parsing input ... Loading ontology Gene Ontology: ... terms Could not store GO:0001529 (elastin): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) ../load_ontology.pl:489 STACK toplevel ../load_ontology.pl:471 -------------------------------------- By running safe mode, it is obvious that there are multiple erroneous/redundant entries... I also tried former releases back to 2003-05-01, and encountered the same difficulties. Is this the same problem or am I having other problems? If not, has anyone contacted the GO people about this issue yet? Thanks in advance, Daniel -- Daniel Lang University of Freiburg, Plant Biotechnology Sonnenstr. 5, D-79104 Freiburg phone: +49 761 203 6988 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang@biologie.uni-freiburg.de ################################################# >REALITY.SYS corrupted: Reboot universe? (Y/N/A) ################################################# From Raphael.Bauer at informatik.hu-berlin.de Thu Sep 18 08:36:13 2003 From: Raphael.Bauer at informatik.hu-berlin.de (Raphael A. Bauer) Date: Thu Sep 18 08:34:17 2003 Subject: [BioSQL-l] Re: gene ontology questions (bug) Message-ID: <3F69A6BD.2080209@informatik.hu-berlin.de> Hi... i've got the same problems as Marc, and i wonder if there is a solution yet. Command is: perl load_ontology.pl --host localhost --dbname bioseqdbspgo --dbuser rb --driver Pg --namespace "Gene Ontology" --format goflat --fmtargs "-defs_file,GO.defs" function.ontology process.ontology component.ontology Output is: Parsing input ... Loading ontology Gene Ontology: ... terms Could not store GO:0001529 (elastin): ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.0/Bio/DB/Persistent/PersistentObject.pm:270 STACK (eval) load_ontology.pl:489 STACK toplevel load_ontology.pl:471 -------------------------------------- Quite Strange... My Bio* things are all the latest releases (BioPerl 1.2.2) For GO i use the files released September 16, 2003.... .. I think the problem is the Go.defs File: term: elastin goid: GO:0001528 definition: OBSOLETE. A major structural protein of mammalian connective tissues; composed of one third glycine, and also rich in proline, alanine, and valine. Chains are cross-linked together via lysine residues. definition_reference: ISBN:0198506732 comment: This term was made obsolete because it represents a gene product. To update annotations, use the molecular function term 'extracellular matrix constituent conferring elasticity activity ; GO:0030023'. term: elastin goid: GO:0001529 definition: OBSOLETE (was not defined before being made obsolete). definition_reference: GO:mah comment: This term was made obsolete because it represents a gene product. To update annotations, use the molecular function term 'extracellular matrix constituent conferring elasticity activity ; GO:0030023'. with two times "elastin".. (it seems that there are many terms that have the same term name.. also seen in term collagen and so on...) and the definition of table term that forbids 2 times the same name(unique): Indexes: term_pkey primary key btree (term_id), term_identifier_key unique btree (identifier), term_name_key unique btree (name, ontology_id), term_ont btree (ontology_id) (Marc already mentioned this...) A dirty workaround would be to rename the term names in GO.defs in case there are two identical names (one elastin and the other elastin CHANGED or so..) .. but is there any recommondation on how to handle the problem safely? Thanks a lot... Raphael From hlapp at gnf.org Thu Sep 18 23:08:58 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Sep 18 23:06:54 2003 Subject: [BioSQL-l] Re: gene ontology questions (bug) In-Reply-To: <3F69A6BD.2080209@informatik.hu-berlin.de> Message-ID: On 9/18/03 5:36 AM, "Raphael A. Bauer" wrote: > with two times "elastin".. (it seems that there are many terms that have > the same term name.. also seen in term collagen and so on...) > > and the definition of table term that forbids 2 times the same name(unique): Correct. There is a UK constraint on term that a name is to be unique within an ontology. Terms are also looked up utilizing this constraint. When you first load on ontology the best strategy is to ignore obsoleted terms, using the option --noobsolete (check the POD of load_ontology.pl, or use --help). The following probably doesn't apply to your use case, but for completeness let me note that the real problem is when you update an ontology and a term has been obsoleted because it was merged with another term that then gets the same name. If you use the otherwise recommendable --updobsolete switch, the obsoleted term would be properly obsoleted in the database, but inserting the successor fails with a UK violation. Using --delobsolete would take care of the problem, but you'd lose annotations to the obsoleted term. Like it or not, but LL and other DBs do contain GO associations to obsoleted terms, so just aggressively deleting them yields undesirable effects. To solve this, I actually resorted to extending the constraint to (name,ontology_id,is_obsolete) in my Oracle version of biosql. Just extending the constraint isn't really advisable though, because then the lookup mechanism in the TermAdaptor needs to be adjusted too. I'll probably end up doing that. To get back to your concrete problem though, --noobsolete probably does what you want. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Thu Sep 18 23:22:57 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Sep 18 23:20:51 2003 Subject: [BioSQL-l] gene ontology questions revisited In-Reply-To: <200309171304.05382.daniel.lang@biologie.uni-freiburg.de> Message-ID: Let me know if the response I just sent for Raphael's posting doesn't answer or doesn't apply to your problem. -hilmar BTW steht's Atlantic noch? Und's Crash? On 9/17/03 4:03 AM, "Daniel Lang" wrote: > Hi, > In june there was a discussion about redundant GO-Terms in GO-flat files and > the related problems when integrating into the database(see Re: gene ontology > questions (bug)Tue Jun 3 15:01:54 EDT 2003). > I think I?m confronted with the same problem... > I wanted to load my biosql instantation with the actual go-flat files using > the load_ontology.pl likes this: > > perl ../load_ontology.pl --dbuser biosql --dbpass 'xxx' --dbname bioseqdb > --driver Pg --namespace "Gene Ontology" --format goflat --fmtargs > "-defs_file,GO.defs" --testonly function.ontology process.ontology > component.ontology > Parsing input ... > Loading ontology Gene Ontology: > ... terms > Could not store GO:0001529 (elastin): > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be found by > unique key > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253 > STACK Bio::DB::Persistent::PersistentObject::store > /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270 > STACK (eval) ../load_ontology.pl:489 > STACK toplevel ../load_ontology.pl:471 > > -------------------------------------- > > By running safe mode, it is obvious that there are multiple > erroneous/redundant entries... > > I also tried former releases back to 2003-05-01, and encountered the same > difficulties. > > Is this the same problem or am I having other problems? > If not, has anyone contacted the GO people about this issue yet? > > Thanks in advance, > Daniel > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From daniel.lang at biologie.uni-freiburg.de Fri Sep 19 08:51:25 2003 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Fri Sep 19 08:49:40 2003 Subject: [BioSQL-l] gene ontology questions revisited In-Reply-To: References: Message-ID: <200309191451.29091.daniel.lang@biologie.uni-freiburg.de> On Friday 19 September 2003 05:22, you wrote: > Let me know if the response I just sent for Raphael's posting doesn't > answer or doesn't apply to your problem. Problem solved, thanks! But another one occurred while loading the data: -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were ("MetaCyc","2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN","0") FKs () ERROR: value too long for type character varying(40) --------------------------------------------------- Could not store term relationship (2-pyrone-4,6-dicarboxylate lactonase activity,IS_A,carboxylic ester hydrolase activity): ------------- EXCEPTION ------------- MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:207 STACK Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/TermAdaptor.pm:290 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:215 STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:243 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:170 STACK Bio::DB::Persistent::PersistentObject::create /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:243 STACK (eval) ../load_ontology.pl:516 STACK toplevel ../load_ontology.pl:515 -------------------------------------- DBD::Pg::st execute failed: ERROR: value too long for type character varying(40) at /usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BaseDriver.pm line 1001, line 2377. Seems that some entries for term.name are longer as the expected 40 chars because of a backslash used to escape a comma in the molecule name:( As I?m not familiar with the GOflat format, I also had a look at the files (and those from may) and it seems, that the escaping is always done in this field. A quick?n dirty solution would be to eliminate the backslashes in the files, but can I update the database so easily with load_ontology.pl ? Daniel -- Daniel Lang University of Freiburg, Plant Biotechnology Sonnenstr. 5, D-79104 Freiburg phone: +49 761 203 6988 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang@biologie.uni-freiburg.de ################################################# >REALITY.SYS corrupted: Reboot universe? (Y/N/A) ################################################# From daniel.lang at biologie.uni-freiburg.de Fri Sep 19 11:04:02 2003 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Fri Sep 19 11:02:20 2003 Subject: [BioSQL-l] gene ontology questions re-revisited In-Reply-To: <200309191451.29091.daniel.lang@biologie.uni-freiburg.de> References: <200309191451.29091.daniel.lang@biologie.uni-freiburg.de> Message-ID: <200309191704.08378.daniel.lang@biologie.uni-freiburg.de> Uhm,... > Seems that some entries for term.name are longer as the expected 40 chars > because of a backslash used to escape a comma in the molecule name:( The corresponding field should of course be "dbxref.accession"... But why are they escaping anyway? And what to do about it? Thanks in advance, Daniel -- Daniel Lang University of Freiburg, Plant Biotechnology Sonnenstr. 5, D-79104 Freiburg phone: +49 761 203 6988 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang@biologie.uni-freiburg.de ################################################# >REALITY.SYS corrupted: Reboot universe? (Y/N/A) ################################################# From hlapp at gnf.org Fri Sep 19 13:37:28 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Sep 19 13:35:28 2003 Subject: [BioSQL-l] gene ontology questions revisited In-Reply-To: <200309191451.29091.daniel.lang@biologie.uni-freiburg.de> Message-ID: On 9/19/03 5:51 AM, "Daniel Lang" wrote: > But another one occurred while loading the data: > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were > ("MetaCyc","2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN","0") FKs () > ERROR: value too long for type character varying(40) > --------------------------------------------------- The problem here is that the references for GO terms are modeled as DBXrefs with dbname and accession. This sometimes applies quite well, but often the reference in the GO.defs file is used in a far wider sense. In the example above for instance, the reference is in fact to a term in another ontology (MetaCyc), so should be a term relationship rather than a reference. So, what you're seeing is the result of deficiencies in the flat file representation (term references can be any of lit.reference, dbxref, and ontology term) and consequently in the parser (who doesn't try to be smarter than the flat file representation). Unfortunately that assessment doesn't help you much. What I did locally (I obviously ran into the same problem) is widening the accession column in dbxref to 64 chars, which is I thought a somewhat reasonable compromise. You don't want to open it up completely and water down the relational model just because a certain flat file format is deficient in its expressivity). This doesn't fix the problem that something ends up as a dbxref when it should rather be a term relationship. Anyone else got a good idea here? I'm cc'ing the bioperl list since this is rather an issue of the object-space representation than one of the schema. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sun Sep 21 00:41:18 2003 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun Sep 21 00:39:16 2003 Subject: [BioSQL-l] slides of persistent bioperl bosc03 talk Message-ID: I offered the slides a while ago and then got dragged away by other things before being able to follow through. I've posted them now: http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf I also wrote a news entry which I guess needs a while to propagate. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Gerben.Menschaert at devgen.com Wed Sep 24 05:49:05 2003 From: Gerben.Menschaert at devgen.com (Gerben Menschaert) Date: Wed Sep 24 05:47:08 2003 Subject: [BioSQL-l] error using load_seqdatabase.pl Message-ID: Hello, I'm trying to load a genbank file into biosql: perl load_seqdatabase.pl --driver Oracle --dbuser biosql --dbpass biosql --dbname sfr01 --lookup --noupdate --safe /data/lazy/gbinv1.seq Every genbank entry load failes with the following error: DBD::Oracle::db prepare failed: ORA-00918: column ambiguously defined (DBD ERROR: OCIStmtExecute/Describ e) [for statement ``SELECT taxon_name.tax_oid, NULL, NULL, taxon_name.tax_oid, taxon_name.name, NULL FRO M taxon, taxon_name WHERE taxon.oid = taxon_name.tax_oid AND name_class = ? AND tax_oid = ?'']) at /usr/ local/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Oracle/SpeciesAdaptorDriver.pm line 214, line 22212 9. This error is normal since the tax_oid in the where clause is indeed ambiguously defined (it missed the prefix "tax_name."). I'm running biosql on Oracle, bioperl-1.2.2 is installed and I'm using the main branch of bioperl-db. I recently changed from bioperl-1.2.1 to 1.2.2. Any ideas? Gerben From hlapp at gnf.org Wed Sep 24 15:34:21 2003 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Sep 24 15:32:23 2003 Subject: [BioSQL-l] error using load_seqdatabase.pl In-Reply-To: Message-ID: This looks clearly like a bug since the generated statement is incorrect. What puzzles me is that I update RefSeq (which is in Genbank format) on a daily basis on an Oracle instance and I'm not seeing this error. Also, I thought there is a test for genbank. I need to check that there is. Did you pre-load the NCBI taxon database? If no, consider doing so, as it will likely spare you from some trouble down the road with species that aren't parsed correctly by flat file parsers. -hilmar On 9/24/03 2:49 AM, "Gerben Menschaert" wrote: > Hello, > > I'm trying to load a genbank file into biosql: > perl load_seqdatabase.pl --driver Oracle --dbuser biosql --dbpass biosql > --dbname sfr01 --lookup --noupdate --safe /data/lazy/gbinv1.seq > > Every genbank entry load failes with the following error: > > DBD::Oracle::db prepare failed: ORA-00918: column ambiguously defined (DBD > ERROR: OCIStmtExecute/Describ > e) [for statement ``SELECT taxon_name.tax_oid, NULL, NULL, taxon_name.tax_oid, > taxon_name.name, NULL FRO > M taxon, taxon_name WHERE taxon.oid = taxon_name.tax_oid AND name_class = ? > AND tax_oid = ?'']) at /usr/ > local/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/Oracle/SpeciesAdaptorDriver.pm > line 214, line 22212 > 9. > This error is normal since the tax_oid in the where clause is indeed > ambiguously defined (it missed the prefix "tax_name."). > > I'm running biosql on Oracle, bioperl-1.2.2 is installed and I'm using the > main branch of bioperl-db. I recently changed from bioperl-1.2.1 to 1.2.2. > > Any ideas? > > Gerben > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------