From hlapp at gnf.org Wed Jun 1 01:48:10 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Jun 1 01:42:30 2005 Subject: [BioSQL-l] RE: [Biojava-l] Change Proposal regarding References In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601B94799@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601B94799@BIONIC.biopolis.one-north.com> Message-ID: On May 31, 2005, at 8:42 PM, Richard HOLLAND wrote: > I should also point out that we should be using the > 'bioentry_reference' > and 'reference' tables, and not 'bioentry_dbxref' as I mistakenly > mentioned in the original post. > Right - so you've corrected this already. Note that reference has a foreign key to dbxref to store the PUBMED or MEDLINE id. The foreign key is identifying; i.e., there's also a unique key constraint on that foreign key, meaning only one reference can point to a particular PUBMED id. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From boehme at mpiib-berlin.mpg.de Thu Jun 2 08:17:42 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Thu Jun 2 08:16:17 2005 Subject: [BioSQL-l] How to add a feature? Message-ID: <429EF8E6.6030309@mpiib-berlin.mpg.de> I'm wondering how to add a feature to a given sequence? I know, I can use createFeature, but that changes nothing in the database, that does addSequence. So is the proper way to retrieve the seq., get all its features, copy it to new seq and add a feature, delete the seq in the database and store the new one? There must be a simpler way? BioJava In Anger is rather sparse on things like that, I could do with a lot more examples .. Martina From Marc.Logghe at devgen.com Thu Jun 2 08:42:56 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Thu Jun 2 08:35:26 2005 Subject: [BioSQL-l] How to add a feature? Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> Hi Martina, I don't know how it goes in BioJava but in BioPerl the flow looks like this: 1) create your feature 2) make it persistent 3) add it to your (persistent) sequence object 4) store the sequence object in the databse 5) commit if necessary HTH, Marc > I'm wondering how to add a feature to a given sequence? > I know, I can use createFeature, but that changes nothing in > the database, that does addSequence. So is the proper way to > retrieve the seq., get all its features, copy it to new seq > and add a feature, delete the seq in the database and store > the new one? > There must be a simpler way? BioJava In Anger is rather > sparse on things like that, I could do with a lot more examples .. > > Martina > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From boehme at mpiib-berlin.mpg.de Thu Jun 2 09:03:30 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Thu Jun 2 08:55:32 2005 Subject: [BioSQL-l] How to add a feature? In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> Message-ID: <429F03A2.1090208@mpiib-berlin.mpg.de> Thanks Marc, but I don't know how to make a feature persistent in Biojava. Maybe someone from the bioJava list can help me? Martina Marc Logghe wrote: > Hi Martina, > I don't know how it goes in BioJava but in BioPerl the flow looks like > this: > 1) create your feature > 2) make it persistent > 3) add it to your (persistent) sequence object > 4) store the sequence object in the databse > 5) commit if necessary > > HTH, > Marc > > >>I'm wondering how to add a feature to a given sequence? >>I know, I can use createFeature, but that changes nothing in >>the database, that does addSequence. So is the proper way to >>retrieve the seq., get all its features, copy it to new seq >>and add a feature, delete the seq in the database and store >>the new one? >>There must be a simpler way? BioJava In Anger is rather >>sparse on things like that, I could do with a lot more examples .. >> >>Martina >>_______________________________________________ >>BioSQL-l mailing list >>BioSQL-l@open-bio.org >>http://open-bio.org/mailman/listinfo/biosql-l > > From simon.foote at nrc-cnrc.gc.ca Thu Jun 2 09:34:30 2005 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Thu Jun 2 09:41:18 2005 Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature? In-Reply-To: <429F03A2.1090208@mpiib-berlin.mpg.de> References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> <429F03A2.1090208@mpiib-berlin.mpg.de> Message-ID: <429F0AE6.6020806@nrc-cnrc.gc.ca> Hi Martina, To add a feature to a sequence stored in a BioSQL database, all you have to do is retrieve the sequence and then add a feature to it. The following simplified code shows you the steps: // Retrieve the sequence from BioSQLSequenceDB Sequence seq = bsd.getSequence(id); // Create new stranded feature StrandedFeature.Template templ = new StrandedFeature.Template(); templ.location = ... templ.strand = ... templ.type = ... templ.source = ... templ.annotation = [A created SimpleAnnotation object] // Add feature to sequence seq.createFeature(templ); // Note: adding the feature like this will automatically persist the feature, so you don't have to worry about doing that. Cheers, Simon Foote -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca Martina wrote: > Thanks Marc, > but I don't know how to make a feature persistent in Biojava. Maybe > someone from the bioJava list can help me? > > Martina > > Marc Logghe wrote: > >> Hi Martina, >> I don't know how it goes in BioJava but in BioPerl the flow looks like >> this: >> 1) create your feature >> 2) make it persistent >> 3) add it to your (persistent) sequence object >> 4) store the sequence object in the databse >> 5) commit if necessary >> >> HTH, >> Marc >> >> >>> I'm wondering how to add a feature to a given sequence? >>> I know, I can use createFeature, but that changes nothing in the >>> database, that does addSequence. So is the proper way to retrieve >>> the seq., get all its features, copy it to new seq and add a >>> feature, delete the seq in the database and store the new one? >>> There must be a simpler way? BioJava In Anger is rather sparse on >>> things like that, I could do with a lot more examples .. >>> >>> Martina >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >> >> >> > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From hlapp at gnf.org Thu Jun 2 12:39:55 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Jun 2 12:34:25 2005 Subject: [BioSQL-l] How to add a feature? In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> Message-ID: On Jun 2, 2005, at 5:42 AM, Marc Logghe wrote: > Hi Martina, > I don't know how it goes in BioJava but in BioPerl the flow looks like > this: > 1) create your feature > 2) make it persistent Just as a note, you don't need to make the feature persistent before adding it. Just add it to the persistent sequence object and then call $pseq->store(). -hilmar > 3) add it to your (persistent) sequence object > 4) store the sequence object in the databse > 5) commit if necessary -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From mark.schreiber at novartis.com Thu Jun 2 21:02:57 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Jun 2 20:55:04 2005 Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature? Message-ID: >There must be a simpler way? BioJava In Anger is rather >sparse on things like that, I could do with a lot more examples .. > All donations of examples are gratefully received. As you say it could do with more examples but hey, I'm only one man, with a day job that is rapidly turning into a night job too : ) - Mark From boehme at mpiib-berlin.mpg.de Mon Jun 6 05:34:50 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 6 05:26:52 2005 Subject: Bio Java (was: Re: [Biojava-l] Re: [BioSQL-l] How to add a feature?) In-Reply-To: References: Message-ID: <42A418BA.8090407@mpiib-berlin.mpg.de> Sorry - I didn't mean you personally! Because it is quite hard for me to figure out how things are working just from the api and the sources, I assumed it would be similar for others starting with BioJava/BioSQL. There must be some working code around somewhere which could be donated? Please do :-) It would increase the popularity of BioJava/BioSQL, which it deserved, I would think. Martina mark.schreiber@novartis.com wrote: >>There must be a simpler way? BioJava In Anger is rather >>sparse on things like that, I could do with a lot more examples .. >> > > > All donations of examples are gratefully received. As you say it could do > with more examples but hey, I'm only one man, with a day job that is > rapidly turning into a night job too : ) > > - Mark > > From boehme at mpiib-berlin.mpg.de Mon Jun 6 10:18:54 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 6 10:12:43 2005 Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature? In-Reply-To: <429F0AE6.6020806@nrc-cnrc.gc.ca> References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> <429F03A2.1090208@mpiib-berlin.mpg.de> <429F0AE6.6020806@nrc-cnrc.gc.ca> Message-ID: <42A45B4E.5070906@mpiib-berlin.mpg.de> Thanks - I knew it would be quite simple, as always with BioJava (once I've figuered out how to, that is)! Martina Simon Foote wrote: > Hi Martina, > > To add a feature to a sequence stored in a BioSQL database, all you > have to do is retrieve the sequence and then add a feature to it. The > following simplified code shows you the steps: > > // Retrieve the sequence from BioSQLSequenceDB > Sequence seq = bsd.getSequence(id); > // Create new stranded feature > StrandedFeature.Template templ = new StrandedFeature.Template(); > templ.location = ... > templ.strand = ... > templ.type = ... > templ.source = ... > templ.annotation = [A created SimpleAnnotation object] > // Add feature to sequence > seq.createFeature(templ); > // Note: adding the feature like this will automatically persist the > feature, so you don't have to worry about doing that. > > Cheers, > Simon Foote > From hlapp at gmx.net Wed Jun 8 22:20:14 2005 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jun 8 22:13:52 2005 Subject: [BioSQL-l] Re: [Bioperl-l] Error loading sequence with load_seqdatabase.pl In-Reply-To: <20050608114341.29861.qmail@web40728.mail.yahoo.com> References: <20050608114341.29861.qmail@web40728.mail.yahoo.com> Message-ID: <9994082cb32d76711db846757e47ad22@gmx.net> What OS are you running this on? How much memory have you got on the machine on which you run the script, and on the machine on which you run the database? Are these the same or not? Which version of DBI and DBD::Pg? This hasn't been reported by anyone else really so I suspect it's either due to too limited memory, or a problem in the DBD driver or in the DBI compiled code. Can you watch the process (using, e.g., top) and see how fast it increases in memory consumption? Since you can continue when you restart it's not something specific to one sequence that would trigger the problem; rather it appears whenever you have run through a certain number of entries the process dies. -hilmar On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri wrote: > Hi, > > I've used the bioperl script load_seqdatabase.pl (came > with the biosql' scripts) to load the bacterial > sequence in genbank format(*.gbk) into PostgreSQL 8.0 > database on Linux machine as: > > $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk & > > Where under the /export/Bacteria/ path are the > Bacteria's name path e.g. Acinetobacter_sp_ADP1 and > the file name are like NC_006824.gbk. > > Previously it used to load some sequences in to some > tables in biosql database (count from table bioentry) > > bioseq=# select count(*) from bioentry; > count > ------- > 33 > (1 row) > > > However, after a while it then stopped with the the > error: > > [1]+ Segmentation fault perl load_seqdatabase.pl > /export/Bacteria/*/*.gbk & > > I then checked and removed the *.gbk file that have > already been loaded in to the table, leaving only the > unloaded ones and ran the scripted again. It > continued to work for some times and stopped again. I > repeated the process several times until 173 sequences > were loaded into the table: > > bioseq=# select count(*) from bioentry; > count > ------- > 173 > (1 row) > > The program then stopped again and this time it > wouldn't run anymore even I tried with only on file. > The error is still the same like: > > $ perl load_seqdatabase.pl > /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk > Segmentation fault > $ > > Now I couldn't load the rest of my sequences into the > database anymore. I would be very apprecialed if any > one knows how to solve the "Segmentation fault" > problem? > > Regards, > > Davina > > > > __________________________________ > Discover Yahoo! > Have fun online with music videos, cool games, IM and more. Check it > out! > http://discover.yahoo.com/ > online.html_______________________________________ > ________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dbastar at yahoo.com Wed Jun 8 23:39:24 2005 From: dbastar at yahoo.com (Duangdaow Kanhasiri) Date: Wed Jun 8 23:34:30 2005 Subject: [BioSQL-l] Re: [Bioperl-l] Error loading sequence with load_seqdatabase.pl In-Reply-To: <9994082cb32d76711db846757e47ad22@gmx.net> Message-ID: <20050609033924.11682.qmail@web40708.mail.yahoo.com> The OS: Rocks Cluster v 3.3 Total Memory: 2 GB DBD::Pg version: 1.42 DBI version: 1.48 --- Hilmar Lapp wrote: > What OS are you running this on? How much memory > have you got on the > machine on which you run the script, and on the > machine on which you > run the database? Are these the same or not? Which > version of DBI and > DBD::Pg? > > This hasn't been reported by anyone else really so I > suspect it's > either due to too limited memory, or a problem in > the DBD driver or in > the DBI compiled code. Can you watch the process > (using, e.g., top) and > see how fast it increases in memory consumption? > Since you can continue > when you restart it's not something specific to one > sequence that would > trigger the problem; rather it appears whenever you > have run through a > certain number of entries the process dies. > > -hilmar > > On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri > wrote: > > > Hi, > > > > I've used the bioperl script load_seqdatabase.pl > (came > > with the biosql' scripts) to load the bacterial > > sequence in genbank format(*.gbk) into PostgreSQL > 8.0 > > database on Linux machine as: > > > > $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk > & > > > > Where under the /export/Bacteria/ path are the > > Bacteria's name path e.g. Acinetobacter_sp_ADP1 > and > > the file name are like NC_006824.gbk. > > > > Previously it used to load some sequences in to > some > > tables in biosql database (count from table > bioentry) > > > > bioseq=# select count(*) from bioentry; > > count > > ------- > > 33 > > (1 row) > > > > > > However, after a while it then stopped with the > the > > error: > > > > [1]+ Segmentation fault perl > load_seqdatabase.pl > > /export/Bacteria/*/*.gbk & > > > > I then checked and removed the *.gbk file that > have > > already been loaded in to the table, leaving only > the > > unloaded ones and ran the scripted again. It > > continued to work for some times and stopped > again. I > > repeated the process several times until 173 > sequences > > were loaded into the table: > > > > bioseq=# select count(*) from bioentry; > > count > > ------- > > 173 > > (1 row) > > > > The program then stopped again and this time it > > wouldn't run anymore even I tried with only on > file. > > The error is still the same like: > > > > $ perl load_seqdatabase.pl > > > /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk > > Segmentation fault > > $ > > > > Now I couldn't load the rest of my sequences into > the > > database anymore. I would be very apprecialed if > any > > one knows how to solve the "Segmentation fault" > > problem? > > > > Regards, > > > > Davina > > > > > > > > __________________________________ > > Discover Yahoo! > > Have fun online with music videos, cool games, IM > and more. Check it > > out! > > http://discover.yahoo.com/ > > > online.html_______________________________________ > > > ________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp > at gnf.org > GNF, San Diego, Ca. 92121 phone: > +1-858-812-1757 > ------------------------------------------------------------- > > > __________________________________ Discover Yahoo! Get on-the-go sports scores, stock quotes, news and more. Check it out! http://discover.yahoo.com/mobile.html From jana.bauckmann at informatik.hu-berlin.de Tue Jun 14 05:52:29 2005 From: jana.bauckmann at informatik.hu-berlin.de (Jana Bauckmann) Date: Tue Jun 14 05:44:16 2005 Subject: [BioSQL-l] memory error while loading SwissProt into Oracle using bioperl-db Message-ID: Hi, I would like to load SwissProt data into my Oracle 9.2 database with BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got two problems: 1) I get many (about 1300) warnings stating integrity constraint errors: ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - parent key not found (DBD ERROR: OCIStmtExecute) ORA-01400: cannot insert NULL into ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS") (DBD ERROR: OCIStmtExecute) 2) The script stops after 2 hours (34500 tuples in table BioEntry) with message: Out of memory! I guess problem 1 causes problem 2. Is this reasonable or do I have two separated problems? I run Oracle and the load script on the same machine with: Suse Linux 9.0 (kernel 2.4.21-291-smp) with 12 GB RAM perl 5.8.1, built for i586-linux-thread-multi bioperl 1.4 bioperl-db 0.1 DBI 1.48 DBD::Oracle 1.16 Oracle 9.2 BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on 6th June 2005) Thanks for any suggestions, Jana From hollandr at gis.a-star.edu.sg Tue Jun 14 06:01:40 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Tue Jun 14 05:54:34 2005 Subject: [BioSQL-l] memory error while loading SwissProt into Oracle usingbioperl-db Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCA91F@BIONIC.biopolis.one-north.com> These are two separate problems. (1) is caused by bad data in your SwissProt file - some of the records in the file refer to journal articles but have not stated any authors. The associated reference objects then do not get created, and neither do their dbxrefs, causing integrity constraint errors elsewhere. (2) means what it says, it's run out of memory! Your script appears to be creating objects, persisting them to the database, but then keeping them in memory afterwards either in the BioPerl-db cache or by keeping its own references somewhere? (I'm not sure of the exact workings of BioPerl-db here, Hilmar could you enlighten us?). How much memory is your Oracle instance and other software using on that server? How much is left for BioPerl? cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > Jana Bauckmann > Sent: Tuesday, June 14, 2005 5:52 PM > To: biosql-l@open-bio.org > Subject: [BioSQL-l] memory error while loading SwissProt into > Oracle usingbioperl-db > > > Hi, > > I would like to load SwissProt data into my Oracle 9.2 database with > BioSQL as schema using load_seqdatabase.pl from bioperl-db. > I've got two > problems: > > 1) I get many (about 1300) warnings stating integrity > constraint errors: > > ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) > violated - parent > key not found (DBD ERROR: OCIStmtExecute) > > ORA-01400: cannot insert NULL into > ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS") > (DBD ERROR: OCIStmtExecute) > > 2) The script stops after 2 hours (34500 tuples in table > BioEntry) with > message: Out of memory! > > I guess problem 1 causes problem 2. Is this reasonable or do > I have two > separated problems? > > I run Oracle and the load script on the same machine with: > Suse Linux 9.0 (kernel 2.4.21-291-smp) with 12 GB RAM > perl 5.8.1, built for i586-linux-thread-multi > bioperl 1.4 > bioperl-db 0.1 > DBI 1.48 > DBD::Oracle 1.16 > Oracle 9.2 > BioSQL schema for Oracle (downloaded from > http://cvs.open-bio.org/ on 6th > June 2005) > > Thanks for any suggestions, > Jana > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From dbastar at yahoo.com Wed Jun 8 07:43:41 2005 From: dbastar at yahoo.com (Duangdaow Kanhasiri) Date: Tue Jun 14 22:17:55 2005 Subject: [BioSQL-l] Error loading sequence with load_seqdatabase.pl Message-ID: <20050608114341.29861.qmail@web40728.mail.yahoo.com> Hi, I've used the bioperl script load_seqdatabase.pl (came with the biosql' scripts) to load the bacterial sequence in genbank format(*.gbk) into PostgreSQL 8.0 database on Linux machine as: $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk & Where under the /export/Bacteria/ path are the Bacteria's name path e.g. Acinetobacter_sp_ADP1 and the file name are like NC_006824.gbk. Previously it used to load some sequences in to some tables in biosql database (count from table bioentry) bioseq=# select count(*) from bioentry; count ------- 33 (1 row) However, after a while it then stopped with the the error: [1]+ Segmentation fault perl load_seqdatabase.pl /export/Bacteria/*/*.gbk & I then checked and removed the *.gbk file that have already been loaded in to the table, leaving only the unloaded ones and ran the scripted again. It continued to work for some times and stopped again. I repeated the process several times until 173 sequences were loaded into the table: bioseq=# select count(*) from bioentry; count ------- 173 (1 row) The program then stopped again and this time it wouldn't run anymore even I tried with only on file. The error is still the same like: $ perl load_seqdatabase.pl /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk Segmentation fault $ Now I couldn't load the rest of my sequences into the database anymore. I would be very apprecialed if any one knows how to solve the "Segmentation fault" problem? Regards, Davina __________________________________ Discover Yahoo! Have fun online with music videos, cool games, IM and more. Check it out! http://discover.yahoo.com/online.html -------------- next part -------------- A non-text attachment was scrubbed... Name: load_seqdatabase.pl Type: application/octet-stream Size: 22486 bytes Desc: 3434098052-load_seqdatabase.pl Url : http://open-bio.org/pipermail/biosql-l/attachments/20050608/1c6b46ab/load_seqdatabase-0001.obj From dbastar at yahoo.com Wed Jun 8 23:55:57 2005 From: dbastar at yahoo.com (Duangdaow Kanhasiri) Date: Tue Jun 14 22:17:55 2005 Subject: [BioSQL-l] Re: [Bioperl-l] Error loading sequence with load_seqdatabase.pl In-Reply-To: <9994082cb32d76711db846757e47ad22@gmx.net> Message-ID: <20050609035557.45275.qmail@web40727.mail.yahoo.com> The system I use hase following configs: CPU: 2 @ AthlonXP2000 OS: Rocks Cluster v 3.3 Total Memory: 2 GB DBD::Pg version: 1.42 DBI version: 1.48 I've attached the out put of the top command (top.txt) with this mail. Unfortunately that the script load_seqdatabase.pl wouldn't run anymore, no matter how many time I tried running it, therefore, I couldn't measure how much it consumes the resource (cpu, memory) on the machine. Regards, Davina --- Hilmar Lapp wrote: > What OS are you running this on? How much memory > have you got on the > machine on which you run the script, and on the > machine on which you > run the database? Are these the same or not? Which > version of DBI and > DBD::Pg? > > This hasn't been reported by anyone else really so I > suspect it's > either due to too limited memory, or a problem in > the DBD driver or in > the DBI compiled code. Can you watch the process > (using, e.g., top) and > see how fast it increases in memory consumption? > Since you can continue > when you restart it's not something specific to one > sequence that would > trigger the problem; rather it appears whenever you > have run through a > certain number of entries the process dies. > > -hilmar > > On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri > wrote: > > > Hi, > > > > I've used the bioperl script load_seqdatabase.pl > (came > > with the biosql' scripts) to load the bacterial > > sequence in genbank format(*.gbk) into PostgreSQL > 8.0 > > database on Linux machine as: > > > > $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk > & > > > > Where under the /export/Bacteria/ path are the > > Bacteria's name path e.g. Acinetobacter_sp_ADP1 > and > > the file name are like NC_006824.gbk. > > > > Previously it used to load some sequences in to > some > > tables in biosql database (count from table > bioentry) > > > > bioseq=# select count(*) from bioentry; > > count > > ------- > > 33 > > (1 row) > > > > > > However, after a while it then stopped with the > the > > error: > > > > [1]+ Segmentation fault perl > load_seqdatabase.pl > > /export/Bacteria/*/*.gbk & > > > > I then checked and removed the *.gbk file that > have > > already been loaded in to the table, leaving only > the > > unloaded ones and ran the scripted again. It > > continued to work for some times and stopped > again. I > > repeated the process several times until 173 > sequences > > were loaded into the table: > > > > bioseq=# select count(*) from bioentry; > > count > > ------- > > 173 > > (1 row) > > > > The program then stopped again and this time it > > wouldn't run anymore even I tried with only on > file. > > The error is still the same like: > > > > $ perl load_seqdatabase.pl > > > /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk > > Segmentation fault > > $ > > > > Now I couldn't load the rest of my sequences into > the > > database anymore. I would be very apprecialed if > any > > one knows how to solve the "Segmentation fault" > > problem? > > > > Regards, > > > > Davina > > > > > > > > __________________________________ > > Discover Yahoo! > > Have fun online with music videos, cool games, IM > and more. Check it > > out! > > http://discover.yahoo.com/ > > > online.html_______________________________________ > > > ________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp > at gnf.org > GNF, San Diego, Ca. 92121 phone: > +1-858-812-1757 > ------------------------------------------------------------- > > > __________________________________ Discover Yahoo! Get on-the-go sports scores, stock quotes, news and more. Check it out! http://discover.yahoo.com/mobile.html -------------- next part -------------- [root@biogenome root]# top 10:31:14 up 27 days, 21:20, 5 users, load average: 0.00, 0.02, 0.03 193 processes: 192 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 1.8% 0.0% 0.0% 0.0% 0.0% 0.0% 198.0% cpu00 1.9% 0.0% 0.0% 0.0% 0.0% 0.0% 98.0% cpu01 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0% Mem: 2057220k av, 1556640k used, 500580k free, 0k shrd, 167096k buff 1101048k actv, 266692k in_d, 39936k in_c Swap: 4192956k av, 91620k used, 4101336k free 1196752k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 16683 root 23 0 1288 1288 844 R 1.9 0.0 0:00 0 top 1 root 15 0 520 516 456 S 0.0 0.0 0:29 0 init 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1 4 root 15 0 0 0 0 SW 0.0 0.0 0:00 1 keventd 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1 9 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush 7 root 15 0 0 0 0 SW 0.0 0.0 0:37 0 kswapd 8 root 15 0 0 0 0 SW 0.0 0.0 0:24 0 kscand 10 root 15 0 0 0 0 SW 0.0 0.0 0:19 0 kupdated 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd 17 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 scsi_eh_0 18 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 aacraid 20 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 scsi_eh_0 23 root 15 0 0 0 0 SW 0.0 0.0 1:29 1 kjournald 70 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd 1165 root 15 0 0 0 0 SW 0.0 0.0 0:49 0 kjournald 1418 root 15 0 0 0 0 SW 0.0 0.0 0:00 1 eth0 1543 root 15 0 620 608 524 S 0.0 0.0 0:59 0 syslogd 1547 root 15 0 484 424 420 S 0.0 0.0 0:00 0 klogd 1557 root 15 0 456 448 392 S 0.0 0.0 2:04 0 irqbalance 1565 rpc 15 0 572 548 500 S 0.0 0.0 0:00 0 portmap 1584 rpcuser 25 0 716 632 628 S 0.0 0.0 0:00 1 rpc.statd 1595 root 15 0 404 388 344 S 0.0 0.0 0:06 0 mdadm 1619 root RT 0 556 456 424 S 0.0 0.0 0:16 1 auditd 1629 nobody 15 0 1180 1016 724 S 0.0 0.0 24:57 1 gmetad 1658 root 15 0 472 424 400 S 0.0 0.0 0:01 0 pvfsd [root@biogenome DBD]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 5.8G 3.6G 1.9G 66% / /dev/sda3 125G 24G 95G 21% /export none 1005M 0 1005M 0% /dev/shm tmpfs 503M 3.5M 499M 1% /var/lib/ganglia/rrds From mark.schreiber at novartis.com Mon Jun 20 01:34:11 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 20 01:25:59 2005 Subject: [BioSQL-l] circular Message-ID: Hello - When circular sequences (plasmids, bacterial genomes etc) are stored in BioSQL how is their circularity indicated? Or, what should the convention be? - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From mark.schreiber at novartis.com Mon Jun 20 02:45:42 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 20 02:37:26 2005 Subject: [BioSQL-l] bioentry-version vs sequence-version Message-ID: Hello - Why do bioentry and sequence both have a version column? Sequence records only exist in one to one relationships with their parent bioentry so surely they would inherit their version number from their parent bioentry? - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From shenyang_11 at 163.com Mon Jun 20 05:11:28 2005 From: shenyang_11 at 163.com (shenyang) Date: Mon Jun 20 05:11:44 2005 Subject: [BioSQL-l] " Lost connection to MySQL server" when I via biosql by using "find_by_unique_key" method Message-ID: <200506200911.j5K9BHgJ009578@portal.open-bio.org> Hello- I updated my mysql from "mysql-standard-4.0.20-sgi-irix6.5-mips" to "mysql-max-4.1.12-sgi-irix6.5-mips". Then I failed to get richseq object from my sequence database which is biosql schema. My perl scripte is " $db = $db||Bio::DB::BioDB->new(-database => "biosql", -printerror => 0, -host => "localhost", -dbname => $dbname, -driver => "mysql", -user => $dbuser, -pass => $dbpass, ); $seq->namespace($namespace); $seq->version($version); my $adp = $db->get_object_adaptor($seq); my $seqfactor=Bio::Seq::SeqFactory->new(-type=>"Bio::Seq::RichSeq"); $lseq = $adp->find_by_unique_key( $seq, -obj_factory =>$seqfactor, ); The error message is " ------------- EXCEPTION ------------- MSG: error while executing statement in Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: Lost connection to MySQL server during query STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/bioperl-db//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:952 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/bioperl-db//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:856 STACK toplevel test_get_seq_embl_acc.pl:9 -------------------------------------- and the mysql log file indicated it's a innodb's problem, here is the mysql logs: " 050620 16:32:47 mysqld restarted 050620 16:32:49 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... 050620 16:32:50 InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 0 2237087117. InnoDB: Doing recovery: scanned up to log sequence number 0 2237087117 InnoDB: Last MySQL binlog file position 0 79, file name ./biomed-bin.000022 050620 16:32:51 InnoDB: Flushing modified pages from the buffer pool... 050620 16:32:51 InnoDB: Started; log sequence number 0 2237087117 050620 16:32:51 [Warning] mysql.user table is not updated to new password format; Disabling new password usage until mysql_fix_privilege_tables is run 050620 16:32:51 [Warning] Can't open and lock time zone table: Table 'mysql.time_zone_leap_second' doesn't exist trying to live without them /database/mysql/bin/mysqld: ready for connections. Version: '4.1.12-max-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Edition - Experimental (GPL)" Thanks for any suggestions, Yang Shen From Marc.Logghe at devgen.com Mon Jun 20 05:33:33 2005 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Mon Jun 20 05:26:30 2005 Subject: [BioSQL-l] circular Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E86A@ANTARESIA.be.devgen.com> Hi Mark, As far as I am aware of, there is currently no field available in the bioentry table to store that kind of flag. It is parsed out from genbank files by BioPerl, though. It is taken from the genbank Locus line, eg. "LOCUS BBPLAS 2687 bp DNA circular BCT 12-MAR-1999" You can check the resulting Bio::Seq::RichSeq object by running the is_circular() method from Bio::PrimarySeq. A solution would be to make a Bio::Factory::SequenceProcessorI compliant processor and pass that as an option to your load_seqdatabase.pl script. In the procesor itself, you can for instance do the following: 1) check for circularity using the is_circular() method 2) if circular, add a term to your sequence object (eg. annotation term, gene ontology term 'is_circular') indicating it is circular My 0.02$ Cheers, Marc > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Monday, June 20, 2005 7:34 AM > To: biosql-l@open-bio.org > Subject: [BioSQL-l] circular > > Hello - > > When circular sequences (plasmids, bacterial genomes etc) are > stored in BioSQL how is their circularity indicated? Or, what > should the convention be? > > - Mark > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From boehme at mpiib-berlin.mpg.de Mon Jun 20 05:43:35 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 20 05:39:41 2005 Subject: [BioSQL-l] _removeSequence Message-ID: <42B68FC7.3060102@mpiib-berlin.mpg.de> Hi, Im trying to delete a sequence and recursivly all its features. So: for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { Sequence s = si.nextSequence(); String name = s.getName(); s = null; db.removeSequence(name); } But if I look in the database (MySQL 4.1.12) I can still see plenty of entries and I have problems entering the same features again, because of dublicate key error. I would like to know if _removeSequence(String) in BioSQLSequenceDB is supposed to remove features recursivly or just the features of the removed sequence? If so - what is the best way do delete the features of the features (and so on)? And how to empty the db completly? Martina From mark.schreiber at novartis.com Mon Jun 20 05:56:40 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 20 05:48:21 2005 Subject: [BioSQL-l] _removeSequence Message-ID: Biojava doesn't attempt to recusivley remove features by itself. It relies on cascading deletes in the database. I know Oracle can be set to do this (and it works very well). If MySQL has equivalent functionality you may need to turn it on. I'm pretty sure it does but you need to set it up. - Mark Martina Sent by: biosql-l-bounces@portal.open-bio.org 06/20/2005 05:43 PM To: biosql-l@open-bio.org, BioJava cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [BioSQL-l] _removeSequence Hi, Im trying to delete a sequence and recursivly all its features. So: for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { Sequence s = si.nextSequence(); String name = s.getName(); s = null; db.removeSequence(name); } But if I look in the database (MySQL 4.1.12) I can still see plenty of entries and I have problems entering the same features again, because of dublicate key error. I would like to know if _removeSequence(String) in BioSQLSequenceDB is supposed to remove features recursivly or just the features of the removed sequence? If so - what is the best way do delete the features of the features (and so on)? And how to empty the db completly? Martina _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From mark.schreiber at novartis.com Mon Jun 20 06:01:41 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 20 05:53:15 2005 Subject: [BioSQL-l] circular Message-ID: So 'is_circular' should be the blessed term. It really needs to be a convention so that reading and writing is consistent between bio* projects. Would it be a good idea for the sequence table of BioSQL 1.1 to have a circular column? - Mark "Marc Logghe" 06/20/2005 05:33 PM To: Mark Schreiber/GP/Novartis@PH, cc: Subject: RE: [BioSQL-l] circular Hi Mark, As far as I am aware of, there is currently no field available in the bioentry table to store that kind of flag. It is parsed out from genbank files by BioPerl, though. It is taken from the genbank Locus line, eg. "LOCUS BBPLAS 2687 bp DNA circular BCT 12-MAR-1999" You can check the resulting Bio::Seq::RichSeq object by running the is_circular() method from Bio::PrimarySeq. A solution would be to make a Bio::Factory::SequenceProcessorI compliant processor and pass that as an option to your load_seqdatabase.pl script. In the procesor itself, you can for instance do the following: 1) check for circularity using the is_circular() method 2) if circular, add a term to your sequence object (eg. annotation term, gene ontology term 'is_circular') indicating it is circular My 0.02$ Cheers, Marc > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Monday, June 20, 2005 7:34 AM > To: biosql-l@open-bio.org > Subject: [BioSQL-l] circular > > Hello - > > When circular sequences (plasmids, bacterial genomes etc) are > stored in BioSQL how is their circularity indicated? Or, what > should the convention be? > > - Mark > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From mark.schreiber at novartis.com Mon Jun 20 06:06:32 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 20 05:58:20 2005 Subject: [BioSQL-l] Re: [Biojava-l] _removeSequence Message-ID: To remove the database completely (while still keeping the tables etc) you would again need to turn on cascading deletes and delete the appropriate biodatabase row from the biodatabase table (or all of them if you have more than one). You cannot currently do this using the biojava interface. You would need to code a JDBC statement to do it for you, or connect to the DB and issue the SQL statement yourself. - Mark Martina Sent by: biojava-l-bounces@portal.open-bio.org 06/20/2005 05:43 PM To: biosql-l@open-bio.org, BioJava cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] _removeSequence Hi, Im trying to delete a sequence and recursivly all its features. So: for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { Sequence s = si.nextSequence(); String name = s.getName(); s = null; db.removeSequence(name); } But if I look in the database (MySQL 4.1.12) I can still see plenty of entries and I have problems entering the same features again, because of dublicate key error. I would like to know if _removeSequence(String) in BioSQLSequenceDB is supposed to remove features recursivly or just the features of the removed sequence? If so - what is the best way do delete the features of the features (and so on)? And how to empty the db completly? Martina _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Mon Jun 20 06:10:29 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Jun 20 06:03:36 2005 Subject: [BioSQL-l] _removeSequence Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> To do cascading deletes in MySQL requires the tables to have been set up using the InnoDB table style (as opposed to the default MyISAM tables). In InnoDB, foreign keys are actually enforced and deletes will cascade, whereas in MyISAM it has no concept of foreign keys and so is unable to enforce data integrity. The people on the BioSQL-L mailing list will be able to help you there. The next version of BioJava's database interfaces after the 1.4 release will assume that the underlying database does have cascading deletes turned on. The existing version half-attempts to make up for the lack of cascading deletes in databases that don't support it, but it doesn't do it well at all, hence the problems you are seeing. After consulting with Hilmar last week we decided it was a fair assumption to make that all BioSQL instances are installed with cascading deletes enabled. BioPerl-db already makes this assumption. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Monday, June 20, 2005 5:57 PM > To: Martina > Cc: biosql-l-bounces@portal.open-bio.org; BioJava; > biosql-l@open-bio.org > Subject: Re: [BioSQL-l] _removeSequence > > > Biojava doesn't attempt to recusivley remove features by > itself. It relies > on cascading deletes in the database. I know Oracle can be > set to do this > (and it works very well). If MySQL has equivalent > functionality you may > need to turn it on. I'm pretty sure it does but you need to set it up. > > - Mark > > > > > > Martina > Sent by: biosql-l-bounces@portal.open-bio.org > 06/20/2005 05:43 PM > > > To: biosql-l@open-bio.org, BioJava > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [BioSQL-l] _removeSequence > > > Hi, > > Im trying to delete a sequence and recursivly all its features. > > So: > > for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { > Sequence s = si.nextSequence(); > String name = s.getName(); > s = null; > db.removeSequence(name); > } > > But if I look in the database (MySQL 4.1.12) I can still see plenty > of entries and I have problems entering the same features again, > because of dublicate key error. I would like to know if > _removeSequence(String) in BioSQLSequenceDB is supposed to remove > features recursivly or just the features of the removed sequence? > If so - what is the best way do delete the features of the features > (and so on)? And how to empty the db completly? > > Martina > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From hollandr at gis.a-star.edu.sg Mon Jun 20 06:11:57 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Jun 20 06:04:53 2005 Subject: [BioSQL-l] Re: [Biojava-l] _removeSequence Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB7A@BIONIC.biopolis.one-north.com> There is also the BS-zap-all script in the BioSQL distribution which will wipe the whole lot for you in one go. :) Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Monday, June 20, 2005 6:07 PM > To: Martina > Cc: biojava-l-bounces@portal.open-bio.org; BioJava; > biosql-l@open-bio.org > Subject: [BioSQL-l] Re: [Biojava-l] _removeSequence > > > To remove the database completely (while still keeping the > tables etc) you > would again need to turn on cascading deletes and delete the > appropriate > biodatabase row from the biodatabase table (or all of them if > you have > more than one). > > You cannot currently do this using the biojava interface. You > would need > to code a JDBC statement to do it for you, or connect to the > DB and issue > the SQL statement yourself. > > - Mark > > > > > > Martina > Sent by: biojava-l-bounces@portal.open-bio.org > 06/20/2005 05:43 PM > > > To: biosql-l@open-bio.org, BioJava > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] _removeSequence > > > Hi, > > Im trying to delete a sequence and recursivly all its features. > > So: > > for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { > Sequence s = si.nextSequence(); > String name = s.getName(); > s = null; > db.removeSequence(name); > } > > But if I look in the database (MySQL 4.1.12) I can still see plenty > of entries and I have problems entering the same features again, > because of dublicate key error. I would like to know if > _removeSequence(String) in BioSQLSequenceDB is supposed to remove > features recursivly or just the features of the removed sequence? > If so - what is the best way do delete the features of the features > (and so on)? And how to empty the db completly? > > Martina > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From boehme at mpiib-berlin.mpg.de Mon Jun 20 06:20:37 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 20 06:25:07 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> Message-ID: <42B69875.3050306@mpiib-berlin.mpg.de> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. Do I need to do anything else? Thanks, Martina Richard HOLLAND wrote: > To do cascading deletes in MySQL requires the tables to have been set up > using the InnoDB table style (as opposed to the default MyISAM tables). > In InnoDB, foreign keys are actually enforced and deletes will cascade, > whereas in MyISAM it has no concept of foreign keys and so is unable to > enforce data integrity. The people on the BioSQL-L mailing list will be > able to help you there. > > The next version of BioJava's database interfaces after the 1.4 release > will assume that the underlying database does have cascading deletes > turned on. The existing version half-attempts to make up for the lack of > cascading deletes in databases that don't support it, but it doesn't do > it well at all, hence the problems you are seeing. After consulting with > Hilmar last week we decided it was a fair assumption to make that all > BioSQL instances are installed with cascading deletes enabled. > BioPerl-db already makes this assumption. > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > > >>-----Original Message----- >>From: biosql-l-bounces@portal.open-bio.org >>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>mark.schreiber@novartis.com >>Sent: Monday, June 20, 2005 5:57 PM >>To: Martina >>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>biosql-l@open-bio.org >>Subject: Re: [BioSQL-l] _removeSequence >> >> >>Biojava doesn't attempt to recusivley remove features by >>itself. It relies >>on cascading deletes in the database. I know Oracle can be >>set to do this >>(and it works very well). If MySQL has equivalent >>functionality you may >>need to turn it on. I'm pretty sure it does but you need to set it up. >> >>- Mark >> >> >> >> >> >>Martina >>Sent by: biosql-l-bounces@portal.open-bio.org >>06/20/2005 05:43 PM >> >> >> To: biosql-l@open-bio.org, BioJava >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [BioSQL-l] _removeSequence >> >> >>Hi, >> >>Im trying to delete a sequence and recursivly all its features. >> >>So: >> >>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >> Sequence s = si.nextSequence(); >> String name = s.getName(); >> s = null; >> db.removeSequence(name); >>} >> >>But if I look in the database (MySQL 4.1.12) I can still see plenty >>of entries and I have problems entering the same features again, >>because of dublicate key error. I would like to know if >>_removeSequence(String) in BioSQLSequenceDB is supposed to remove >>features recursivly or just the features of the removed sequence? >>If so - what is the best way do delete the features of the features >>(and so on)? And how to empty the db completly? >> >>Martina >> >>_______________________________________________ >>BioSQL-l mailing list >>BioSQL-l@open-bio.org >>http://open-bio.org/mailman/listinfo/biosql-l >> >> >> >>_______________________________________________ >>BioSQL-l mailing list >>BioSQL-l@open-bio.org >>http://open-bio.org/mailman/listinfo/biosql-l > > From hollandr at gis.a-star.edu.sg Mon Jun 20 06:33:02 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Jun 20 06:26:20 2005 Subject: [BioSQL-l] _removeSequence Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> Well, technically that should work because BioJava simply issues a delete against the seqfeature table, and therefore all features related through foreign keys should automatically delete themselves as a result without any further intervention by BioJava... beats me why it doesn't! Unfortunately I don't currently use the MySQL implementation myself so I can't help much. I hope someone on BioSQL-L knows a little more? Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Martina [mailto:boehme@mpiib-berlin.mpg.de] > Sent: Monday, June 20, 2005 6:21 PM > To: Richard HOLLAND > Cc: biosql-l-bounces@portal.open-bio.org; BioJava; > biosql-l@open-bio.org > Subject: Re: [BioSQL-l] _removeSequence > > > My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 > 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. > Do I need to do anything else? > > Thanks, > Martina > > Richard HOLLAND wrote: > > > To do cascading deletes in MySQL requires the tables to > have been set up > > using the InnoDB table style (as opposed to the default > MyISAM tables). > > In InnoDB, foreign keys are actually enforced and deletes > will cascade, > > whereas in MyISAM it has no concept of foreign keys and so > is unable to > > enforce data integrity. The people on the BioSQL-L mailing > list will be > > able to help you there. > > > > The next version of BioJava's database interfaces after the > 1.4 release > > will assume that the underlying database does have cascading deletes > > turned on. The existing version half-attempts to make up > for the lack of > > cascading deletes in databases that don't support it, but > it doesn't do > > it well at all, hence the problems you are seeing. After > consulting with > > Hilmar last week we decided it was a fair assumption to > make that all > > BioSQL instances are installed with cascading deletes enabled. > > BioPerl-db already makes this assumption. > > > > cheers, > > Richard > > > > Richard Holland > > Bioinformatics Specialist > > GIS extension 8199 > > --------------------------------------------- > > This email is confidential and may be privileged. If you are not the > > intended recipient, please delete it and notify us > immediately. Please > > do not copy or use it for any purpose, or disclose its > content to any > > other person. Thank you. > > --------------------------------------------- > > > > > > > >>-----Original Message----- > >>From: biosql-l-bounces@portal.open-bio.org > >>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > >>mark.schreiber@novartis.com > >>Sent: Monday, June 20, 2005 5:57 PM > >>To: Martina > >>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; > >>biosql-l@open-bio.org > >>Subject: Re: [BioSQL-l] _removeSequence > >> > >> > >>Biojava doesn't attempt to recusivley remove features by > >>itself. It relies > >>on cascading deletes in the database. I know Oracle can be > >>set to do this > >>(and it works very well). If MySQL has equivalent > >>functionality you may > >>need to turn it on. I'm pretty sure it does but you need to > set it up. > >> > >>- Mark > >> > >> > >> > >> > >> > >>Martina > >>Sent by: biosql-l-bounces@portal.open-bio.org > >>06/20/2005 05:43 PM > >> > >> > >> To: biosql-l@open-bio.org, BioJava > > >> cc: (bcc: Mark Schreiber/GP/Novartis) > >> Subject: [BioSQL-l] _removeSequence > >> > >> > >>Hi, > >> > >>Im trying to delete a sequence and recursivly all its features. > >> > >>So: > >> > >>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { > >> Sequence s = si.nextSequence(); > >> String name = s.getName(); > >> s = null; > >> db.removeSequence(name); > >>} > >> > >>But if I look in the database (MySQL 4.1.12) I can still > see plenty > >>of entries and I have problems entering the same features again, > >>because of dublicate key error. I would like to know if > >>_removeSequence(String) in BioSQLSequenceDB is supposed to remove > >>features recursivly or just the features of the removed sequence? > >>If so - what is the best way do delete the features of the features > >>(and so on)? And how to empty the db completly? > >> > >>Martina > >> > >>_______________________________________________ > >>BioSQL-l mailing list > >>BioSQL-l@open-bio.org > >>http://open-bio.org/mailman/listinfo/biosql-l > >> > >> > >> > >>_______________________________________________ > >>BioSQL-l mailing list > >>BioSQL-l@open-bio.org > >>http://open-bio.org/mailman/listinfo/biosql-l > > > > > From boehme at mpiib-berlin.mpg.de Mon Jun 20 09:11:25 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 20 09:05:29 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> Message-ID: <42B6C07D.7000106@mpiib-berlin.mpg.de> I droped the db and run the bioSql again - looks like its working now! Must have stopped before the alter table statements - didn't had the foreign keys - but I didn't know, that they had to be there. Thanks! Richard HOLLAND wrote: > Well, technically that should work because BioJava simply issues a > delete against the seqfeature table, and therefore all features related > through foreign keys should automatically delete themselves as a result > without any further intervention by BioJava... beats me why it doesn't! > Unfortunately I don't currently use the MySQL implementation myself so I > can't help much. I hope someone on BioSQL-L knows a little more? > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > > >>-----Original Message----- >>From: Martina [mailto:boehme@mpiib-berlin.mpg.de] >>Sent: Monday, June 20, 2005 6:21 PM >>To: Richard HOLLAND >>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>biosql-l@open-bio.org >>Subject: Re: [BioSQL-l] _removeSequence >> >> >>My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 >>2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. >>Do I need to do anything else? >> >>Thanks, >>Martina >> >>Richard HOLLAND wrote: >> >> >>>To do cascading deletes in MySQL requires the tables to >> >>have been set up >> >>>using the InnoDB table style (as opposed to the default >> >>MyISAM tables). >> >>>In InnoDB, foreign keys are actually enforced and deletes >> >>will cascade, >> >>>whereas in MyISAM it has no concept of foreign keys and so >> >>is unable to >> >>>enforce data integrity. The people on the BioSQL-L mailing >> >>list will be >> >>>able to help you there. >>> >>>The next version of BioJava's database interfaces after the >> >>1.4 release >> >>>will assume that the underlying database does have cascading deletes >>>turned on. The existing version half-attempts to make up >> >>for the lack of >> >>>cascading deletes in databases that don't support it, but >> >>it doesn't do >> >>>it well at all, hence the problems you are seeing. After >> >>consulting with >> >>>Hilmar last week we decided it was a fair assumption to >> >>make that all >> >>>BioSQL instances are installed with cascading deletes enabled. >>>BioPerl-db already makes this assumption. >>> >>>cheers, >>>Richard >>> >>>Richard Holland >>>Bioinformatics Specialist >>>GIS extension 8199 >>>--------------------------------------------- >>>This email is confidential and may be privileged. If you are not the >>>intended recipient, please delete it and notify us >> >>immediately. Please >> >>>do not copy or use it for any purpose, or disclose its >> >>content to any >> >>>other person. Thank you. >>>--------------------------------------------- >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: biosql-l-bounces@portal.open-bio.org >>>>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>>>mark.schreiber@novartis.com >>>>Sent: Monday, June 20, 2005 5:57 PM >>>>To: Martina >>>>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>>>biosql-l@open-bio.org >>>>Subject: Re: [BioSQL-l] _removeSequence >>>> >>>> >>>>Biojava doesn't attempt to recusivley remove features by >>>>itself. It relies >>>>on cascading deletes in the database. I know Oracle can be >>>>set to do this >>>>(and it works very well). If MySQL has equivalent >>>>functionality you may >>>>need to turn it on. I'm pretty sure it does but you need to >> >>set it up. >> >>>>- Mark >>>> >>>> >>>> >>>> >>>> >>>>Martina >>>>Sent by: biosql-l-bounces@portal.open-bio.org >>>>06/20/2005 05:43 PM >>>> >>>> >>>> To: biosql-l@open-bio.org, BioJava >> >> >> >>>> cc: (bcc: Mark Schreiber/GP/Novartis) >>>> Subject: [BioSQL-l] _removeSequence >>>> >>>> >>>>Hi, >>>> >>>>Im trying to delete a sequence and recursivly all its features. >>>> >>>>So: >>>> >>>>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >>>> Sequence s = si.nextSequence(); >>>> String name = s.getName(); >>>> s = null; >>>> db.removeSequence(name); >>>>} >>>> >>>>But if I look in the database (MySQL 4.1.12) I can still >> >>see plenty >> >>>>of entries and I have problems entering the same features again, >>>>because of dublicate key error. I would like to know if >>>>_removeSequence(String) in BioSQLSequenceDB is supposed to remove >>>>features recursivly or just the features of the removed sequence? >>>>If so - what is the best way do delete the features of the features >>>>(and so on)? And how to empty the db completly? >>>> >>>>Martina >>>> >>>>_______________________________________________ >>>>BioSQL-l mailing list >>>>BioSQL-l@open-bio.org >>>>http://open-bio.org/mailman/listinfo/biosql-l >>>> >>>> >>>> >>>>_______________________________________________ >>>>BioSQL-l mailing list >>>>BioSQL-l@open-bio.org >>>>http://open-bio.org/mailman/listinfo/biosql-l >>> >>> > From boehme at mpiib-berlin.mpg.de Mon Jun 20 11:20:35 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 20 11:38:47 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> Message-ID: <42B6DEC3.9090807@mpiib-berlin.mpg.de> Hi, so I have this new database (still biosqldb-mysql.sqlv 1.40 2004/11/04 01:49:41) and after removing all sequences, I do still have entries in term, term_relationship,term_relationship_term and ontology. And of course, in biodatabase. If I delete the entry in biodatabase too, nothing changes. Is that what is to be expected? Cause I still have trouble with the dublicate entry key, but that must be my code then. Thanks Martina From hlapp at gnf.org Mon Jun 20 13:48:04 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jun 20 13:38:09 2005 Subject: [BioSQL-l] " Lost connection to MySQL server" when I via biosql by using "find_by_unique_key" method In-Reply-To: <200506200911.j5K9BHgJ009578@portal.open-bio.org> References: <200506200911.j5K9BHgJ009578@portal.open-bio.org> Message-ID: <65541f3e2669ba1ffd9eccaa9dc21988@gnf.org> Maybe there's a migration script that you need to run and that comes with mysql? Have you checked Mysql FAQs and possibly message boards/README/HOWTO for what you need to do when upgrading from 4.0.x to 4.1.x? -hilmar On Jun 20, 2004, at 1:51 AM, shenyang wrote: > Hello- > I updated my mysql from "mysql-standard-4.0.20-sgi-irix6.5-mips" to > "mysql-max-4.1.12-sgi-irix6.5-mips". > > Then I failed to get richseq object from my sequence database which is > biosql schema. > My perl scripte is " > > $db = $db||Bio::DB::BioDB->new(-database => "biosql", > -printerror => 0, > -host => "localhost", > -dbname => $dbname, > -driver => "mysql", > -user => $dbuser, > -pass => $dbpass, > ); > $seq->namespace($namespace); > $seq->version($version); > my $adp = $db->get_object_adaptor($seq); > my $seqfactor=Bio::Seq::SeqFactory->new(-type=>"Bio::Seq::RichSeq"); > $lseq = $adp->find_by_unique_key( > $seq, > -obj_factory =>$seqfactor, > ); > > The error message is " > > ------------- EXCEPTION ------------- > > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: Lost connection to > MySQL server during query > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /usr/lib/bioperl-db//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:952 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /usr/lib/bioperl-db//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:856 > STACK toplevel test_get_seq_embl_acc.pl:9 > > -------------------------------------- > > and the mysql log file indicated it's a innodb's problem, here is the > mysql logs: > > " > 050620 16:32:47 mysqld restarted > 050620 16:32:49 InnoDB: Database was not shut down normally! > InnoDB: Starting crash recovery. > InnoDB: Reading tablespace information from the .ibd files... > InnoDB: Restoring possible half-written data pages from the doublewrite > InnoDB: buffer... > 050620 16:32:50 InnoDB: Starting log scan based on checkpoint at > InnoDB: log sequence number 0 2237087117. > InnoDB: Doing recovery: scanned up to log sequence number 0 2237087117 > InnoDB: Last MySQL binlog file position 0 79, file name > ./biomed-bin.000022 > 050620 16:32:51 InnoDB: Flushing modified pages from the buffer > pool... > 050620 16:32:51 InnoDB: Started; log sequence number 0 2237087117 > 050620 16:32:51 [Warning] mysql.user table is not updated to new > password format; Disabling new password usage until > mysql_fix_privilege_tables is run > 050620 16:32:51 [Warning] Can't open and lock time zone table: Table > 'mysql.time_zone_leap_second' doesn't exist trying to live without > them > /database/mysql/bin/mysqld: ready for connections. > Version: '4.1.12-max-log' socket: '/tmp/mysql.sock' port: 3306 > MySQL Community Edition - Experimental (GPL)" > > > Thanks for any suggestions, > Yang Shen > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Jun 20 15:19:04 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jun 20 15:10:24 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> Message-ID: <78e39420822012ffbf691b5edc233b4a@gnf.org> There's one thing that I'm unsure about in Martina's original email, namely whether she was referring to features related to a sequence (bioentry), or to features hierarchically related to each other through the seqfeature_relationship table. If the former, then the cascading delete should have taken care of removing the features when you remove the sequence (bioentry) to which they point through their foreign key (and recursively the locations etc). However, if the question was about hierarchical features, then deleting one feature in the hierarchy will never (and shouldn't ever) delete any other feature in the hierarchy (except if all of them reference the same bioentry and you deleted the bioentry). If you delete a seqfeature in a hierarchy of seqfeatures then by cascading delete this will also delete all rows in seqfeature_relationship that reference that seqfeature as either a subject or an object in a nesting relationship between features. I.e., looking at the hierarchy as a graph, removing a node will cascade to deleting all incoming and outgoing arcs for that node, but not other nodes. If your application wants to take down all nodes in the hierarchy when one node is deleted, you need to write code to do this. (Except if, as mentioned before, all features reference the same bioentry, in which case deleting the bioentry will delete the entire feature hierarchy.) -hilmar On Jun 20, 2005, at 3:33 AM, Richard HOLLAND wrote: > Well, technically that should work because BioJava simply issues a > delete against the seqfeature table, and therefore all features related > through foreign keys should automatically delete themselves as a result > without any further intervention by BioJava... beats me why it doesn't! > Unfortunately I don't currently use the MySQL implementation myself so > I > can't help much. I hope someone on BioSQL-L knows a little more? > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > >> -----Original Message----- >> From: Martina [mailto:boehme@mpiib-berlin.mpg.de] >> Sent: Monday, June 20, 2005 6:21 PM >> To: Richard HOLLAND >> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >> biosql-l@open-bio.org >> Subject: Re: [BioSQL-l] _removeSequence >> >> >> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 >> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. >> Do I need to do anything else? >> >> Thanks, >> Martina >> >> Richard HOLLAND wrote: >> >>> To do cascading deletes in MySQL requires the tables to >> have been set up >>> using the InnoDB table style (as opposed to the default >> MyISAM tables). >>> In InnoDB, foreign keys are actually enforced and deletes >> will cascade, >>> whereas in MyISAM it has no concept of foreign keys and so >> is unable to >>> enforce data integrity. The people on the BioSQL-L mailing >> list will be >>> able to help you there. >>> >>> The next version of BioJava's database interfaces after the >> 1.4 release >>> will assume that the underlying database does have cascading deletes >>> turned on. The existing version half-attempts to make up >> for the lack of >>> cascading deletes in databases that don't support it, but >> it doesn't do >>> it well at all, hence the problems you are seeing. After >> consulting with >>> Hilmar last week we decided it was a fair assumption to >> make that all >>> BioSQL instances are installed with cascading deletes enabled. >>> BioPerl-db already makes this assumption. >>> >>> cheers, >>> Richard >>> >>> Richard Holland >>> Bioinformatics Specialist >>> GIS extension 8199 >>> --------------------------------------------- >>> This email is confidential and may be privileged. If you are not the >>> intended recipient, please delete it and notify us >> immediately. Please >>> do not copy or use it for any purpose, or disclose its >> content to any >>> other person. Thank you. >>> --------------------------------------------- >>> >>> >>> >>>> -----Original Message----- >>>> From: biosql-l-bounces@portal.open-bio.org >>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>>> mark.schreiber@novartis.com >>>> Sent: Monday, June 20, 2005 5:57 PM >>>> To: Martina >>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>>> biosql-l@open-bio.org >>>> Subject: Re: [BioSQL-l] _removeSequence >>>> >>>> >>>> Biojava doesn't attempt to recusivley remove features by >>>> itself. It relies >>>> on cascading deletes in the database. I know Oracle can be >>>> set to do this >>>> (and it works very well). If MySQL has equivalent >>>> functionality you may >>>> need to turn it on. I'm pretty sure it does but you need to >> set it up. >>>> >>>> - Mark >>>> >>>> >>>> >>>> >>>> >>>> Martina >>>> Sent by: biosql-l-bounces@portal.open-bio.org >>>> 06/20/2005 05:43 PM >>>> >>>> >>>> To: biosql-l@open-bio.org, BioJava >> >>>> cc: (bcc: Mark Schreiber/GP/Novartis) >>>> Subject: [BioSQL-l] _removeSequence >>>> >>>> >>>> Hi, >>>> >>>> Im trying to delete a sequence and recursivly all its features. >>>> >>>> So: >>>> >>>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >>>> Sequence s = si.nextSequence(); >>>> String name = s.getName(); >>>> s = null; >>>> db.removeSequence(name); >>>> } >>>> >>>> But if I look in the database (MySQL 4.1.12) I can still >> see plenty >>>> of entries and I have problems entering the same features again, >>>> because of dublicate key error. I would like to know if >>>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove >>>> features recursivly or just the features of the removed sequence? >>>> If so - what is the best way do delete the features of the features >>>> (and so on)? And how to empty the db completly? >>>> >>>> Martina >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l@open-bio.org >>>> http://open-bio.org/mailman/listinfo/biosql-l >>>> >>>> >>>> >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l@open-bio.org >>>> http://open-bio.org/mailman/listinfo/biosql-l >>> >>> >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Jun 20 15:33:11 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jun 20 15:23:39 2005 Subject: [BioSQL-l] circular In-Reply-To: References: Message-ID: <06eb73cb04fc0adb0c8565ddae4e946b@gnf.org> Interesting question. I'd argue that the root question is whether the boolean property of circularity is best considered as an annotation of a bioentry with sequence, or as a core property of a biosequence. Annotation generally is something that's applicable to some but not to other entries. A core property is something that can be well defined for (almost) all rows, and/or is necessary to define uniqueness or operations on the object. Is_circular can certainly be defined for all biosequence rows. Also, in order to define operations like taking a subsequence of length 100 starting 50bp before the end, knowing whether the sequence is circular makes a critical difference. So, short-term you can store it as annotation (tag/value) on the bioentry, but long-term I think this needs to be added to the biosequence table as a column. -hilmar On Jun 20, 2005, at 3:01 AM, mark.schreiber@novartis.com wrote: > So 'is_circular' should be the blessed term. It really needs to be a > convention so that reading and writing is consistent between bio* > projects. > > Would it be a good idea for the sequence table of BioSQL 1.1 to have a > circular column? > > - Mark > > > > > > "Marc Logghe" > 06/20/2005 05:33 PM > > > To: Mark Schreiber/GP/Novartis@PH, > cc: > Subject: RE: [BioSQL-l] circular > > > Hi Mark, > As far as I am aware of, there is currently no field available in the > bioentry table to store that kind of flag. > It is parsed out from genbank files by BioPerl, though. > It is taken from the genbank Locus line, eg. > "LOCUS BBPLAS 2687 bp DNA circular BCT > 12-MAR-1999" > You can check the resulting Bio::Seq::RichSeq object by running the > is_circular() method from Bio::PrimarySeq. > A solution would be to make a Bio::Factory::SequenceProcessorI > compliant > processor and pass that as an option to your load_seqdatabase.pl > script. > In the procesor itself, you can for instance do the following: > 1) check for circularity using the is_circular() method > 2) if circular, add a term to your sequence object (eg. annotation > term, > gene ontology term 'is_circular') indicating it is circular > > My 0.02$ > > Cheers, > Marc > > >> -----Original Message----- >> From: biosql-l-bounces@portal.open-bio.org >> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >> mark.schreiber@novartis.com >> Sent: Monday, June 20, 2005 7:34 AM >> To: biosql-l@open-bio.org >> Subject: [BioSQL-l] circular >> >> Hello - >> >> When circular sequences (plasmids, bacterial genomes etc) are >> stored in BioSQL how is their circularity indicated? Or, what >> should the convention be? >> >> - Mark >> >> Mark Schreiber >> Principal Scientist (Bioinformatics) >> >> Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road >> #05-01 Chromos >> Singapore 138670 >> www.nitd.novartis.com >> >> phone +65 6722 2973 >> fax +65 6722 2910 >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Jun 20 15:39:24 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jun 20 15:29:49 2005 Subject: [BioSQL-l] bioentry-version vs sequence-version In-Reply-To: References: Message-ID: From the schema-overview.txt: Sequences may have their own version number, independent of its bioentry version information. This pretty much states it. Usually they will have the same version, but some data providers may choose to increment the version of the sequence whenever the sequence changes, and the version of the entry whenever the sequence or the annotation change. -hilmar On Jun 19, 2005, at 11:45 PM, mark.schreiber@novartis.com wrote: > Hello - > > Why do bioentry and sequence both have a version column? Sequence > records > only exist in one to one relationships with their parent bioentry so > surely they would inherit their version number from their parent > bioentry? > > - Mark > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Jun 20 15:57:56 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jun 20 15:48:18 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <42B6DEC3.9090807@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> Message-ID: On Jun 20, 2005, at 8:20 AM, Martina wrote: > Hi, > > so I have this new database (still biosqldb-mysql.sqlv 1.40 2004/11/04 > 01:49:41) and after removing all sequences, I do still have entries in > term, term_relationship,term_relationship_term and ontology. And of > course, in biodatabase. If I delete the entry in biodatabase too, > nothing changes. Is that what is to be expected? Yes. Deletes cascade through foreign key constraints and nothing else. Term has a n:n relationship with bioentry and therefore does not have a foreign key to bioentry. More generally, and provided cascading deletes are enabled, if you delete a row in a master table, all corresponding rows in detail tables are deleted that stand in a 1:n relationship to the master table (and therefore have a foreign key defined pointing to the master table). Rows in n:n related tables will not be deleted, but dissociated by deleting the corresponding rows from the association table. As examples, Comment and Biosequence are 1:n related to bioentry, whereas dbxref, reference, and term are n:n related, with bioentry_dbxref, bioentry_reference, and bioentry_qualifier_value being the association tables. If this is confusing to you, you should read a general textbook on relational databases and normalization which usually will explain this a lot better than I do. > Cause I still have trouble with the dublicate entry key, but that must > be my code then. Yes. When you insert a sequence you must be prepared that when inserting its ontology term or tag/value annotation the term may already be present because another bioentry uses it too. Similarly for Reference and Dbxref (although I believe Biojava doesn't use these - yet). -hilmar > > Thanks > Martina > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Mon Jun 20 16:05:26 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jun 20 15:55:30 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <42B69875.3050306@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B69875.3050306@mpiib-berlin.mpg.de> Message-ID: <3b0bdefb15e41a8a9020e2ffdf3e1312@gnf.org> You should actually check whether InnoDB is enabled in your instance of Mysql. Mysql has the "nice" behaviour of silently converting the table manager to MyISAM if InnoDB has not been enabled in the instance. It will not throw an error. Up until at least 4.0.x InnoDB was disabled by default. You can check whether it is enabled by issuing mysql> show variables; and then look for the have_innodb variable. It needs to have the value of YES. The variables with innodb_ prefix will tell you where it creates its tablespaces etc. If it is not enabled, you need to edit Mysql's config file accordingly and restart the Mysql daemon. -hilmar On Jun 20, 2005, at 3:20 AM, Martina wrote: > My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 > 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. > Do I need to do anything else? > > Thanks, > Martina > > Richard HOLLAND wrote: > >> To do cascading deletes in MySQL requires the tables to have been set >> up >> using the InnoDB table style (as opposed to the default MyISAM >> tables). >> In InnoDB, foreign keys are actually enforced and deletes will >> cascade, >> whereas in MyISAM it has no concept of foreign keys and so is unable >> to >> enforce data integrity. The people on the BioSQL-L mailing list will >> be >> able to help you there. >> The next version of BioJava's database interfaces after the 1.4 >> release >> will assume that the underlying database does have cascading deletes >> turned on. The existing version half-attempts to make up for the lack >> of >> cascading deletes in databases that don't support it, but it doesn't >> do >> it well at all, hence the problems you are seeing. After consulting >> with >> Hilmar last week we decided it was a fair assumption to make that all >> BioSQL instances are installed with cascading deletes enabled. >> BioPerl-db already makes this assumption. >> cheers, >> Richard >> Richard Holland >> Bioinformatics Specialist >> GIS extension 8199 >> --------------------------------------------- >> This email is confidential and may be privileged. If you are not the >> intended recipient, please delete it and notify us immediately. Please >> do not copy or use it for any purpose, or disclose its content to any >> other person. Thank you. >> --------------------------------------------- >>> -----Original Message----- >>> From: biosql-l-bounces@portal.open-bio.org >>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>> mark.schreiber@novartis.com >>> Sent: Monday, June 20, 2005 5:57 PM >>> To: Martina >>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>> biosql-l@open-bio.org >>> Subject: Re: [BioSQL-l] _removeSequence >>> >>> >>> Biojava doesn't attempt to recusivley remove features by itself. It >>> relies on cascading deletes in the database. I know Oracle can be >>> set to do this (and it works very well). If MySQL has equivalent >>> functionality you may need to turn it on. I'm pretty sure it does >>> but you need to set it up. >>> >>> - Mark >>> >>> >>> >>> >>> >>> Martina >>> Sent by: biosql-l-bounces@portal.open-bio.org >>> 06/20/2005 05:43 PM >>> >>> To: biosql-l@open-bio.org, BioJava >>> cc: (bcc: Mark Schreiber/GP/Novartis) >>> Subject: [BioSQL-l] _removeSequence >>> >>> >>> Hi, >>> >>> Im trying to delete a sequence and recursivly all its features. >>> >>> So: >>> >>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >>> Sequence s = si.nextSequence(); >>> String name = s.getName(); >>> s = null; >>> db.removeSequence(name); >>> } >>> >>> But if I look in the database (MySQL 4.1.12) I can still see plenty >>> of entries and I have problems entering the same features again, >>> because of dublicate key error. I would like to know if >>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove >>> features recursivly or just the features of the removed sequence? >>> If so - what is the best way do delete the features of the features >>> (and so on)? And how to empty the db completly? >>> >>> Martina >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From boehme at mpiib-berlin.mpg.de Tue Jun 21 05:46:22 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Tue Jun 21 05:38:04 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <78e39420822012ffbf691b5edc233b4a@gnf.org> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> <78e39420822012ffbf691b5edc233b4a@gnf.org> Message-ID: <42B7E1EE.5090505@mpiib-berlin.mpg.de> Hi Hilmar, I wasn't aware of 2 different types of features. I'm making features as described in http://www.biojava.org/docs/bj_in_anger/feature.htm, and as far as I can tell from the results, its the first type you describe. The second type of feature is confusing me: as I understood the feature relationships, the graph is a tree, with only one parent for a given feature, and if that feature is deleted, all its children should get deleted too? Martina Hilmar Lapp wrote: > There's one thing that I'm unsure about in Martina's original email, > namely whether she was referring to features related to a sequence > (bioentry), or to features hierarchically related to each other through > the seqfeature_relationship table. > > If the former, then the cascading delete should have taken care of > removing the features when you remove the sequence (bioentry) to which > they point through their foreign key (and recursively the locations etc). > > However, if the question was about hierarchical features, then deleting > one feature in the hierarchy will never (and shouldn't ever) delete any > other feature in the hierarchy (except if all of them reference the same > bioentry and you deleted the bioentry). If you delete a seqfeature in a > hierarchy of seqfeatures then by cascading delete this will also delete > all rows in seqfeature_relationship that reference that seqfeature as > either a subject or an object in a nesting relationship between > features. I.e., looking at the hierarchy as a graph, removing a node > will cascade to deleting all incoming and outgoing arcs for that node, > but not other nodes. > > If your application wants to take down all nodes in the hierarchy when > one node is deleted, you need to write code to do this. (Except if, as > mentioned before, all features reference the same bioentry, in which > case deleting the bioentry will delete the entire feature hierarchy.) > > -hilmar > > On Jun 20, 2005, at 3:33 AM, Richard HOLLAND wrote: > >> Well, technically that should work because BioJava simply issues a >> delete against the seqfeature table, and therefore all features related >> through foreign keys should automatically delete themselves as a result >> without any further intervention by BioJava... beats me why it doesn't! >> Unfortunately I don't currently use the MySQL implementation myself so I >> can't help much. I hope someone on BioSQL-L knows a little more? >> >> Richard Holland >> Bioinformatics Specialist >> GIS extension 8199 >> --------------------------------------------- >> This email is confidential and may be privileged. If you are not the >> intended recipient, please delete it and notify us immediately. Please >> do not copy or use it for any purpose, or disclose its content to any >> other person. Thank you. >> --------------------------------------------- >> >> >>> -----Original Message----- >>> From: Martina [mailto:boehme@mpiib-berlin.mpg.de] >>> Sent: Monday, June 20, 2005 6:21 PM >>> To: Richard HOLLAND >>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>> biosql-l@open-bio.org >>> Subject: Re: [BioSQL-l] _removeSequence >>> >>> >>> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 >>> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. >>> Do I need to do anything else? >>> >>> Thanks, >>> Martina >>> >>> Richard HOLLAND wrote: >>> >>>> To do cascading deletes in MySQL requires the tables to >>> >>> have been set up >>> >>>> using the InnoDB table style (as opposed to the default >>> >>> MyISAM tables). >>> >>>> In InnoDB, foreign keys are actually enforced and deletes >>> >>> will cascade, >>> >>>> whereas in MyISAM it has no concept of foreign keys and so >>> >>> is unable to >>> >>>> enforce data integrity. The people on the BioSQL-L mailing >>> >>> list will be >>> >>>> able to help you there. >>>> >>>> The next version of BioJava's database interfaces after the >>> >>> 1.4 release >>> >>>> will assume that the underlying database does have cascading deletes >>>> turned on. The existing version half-attempts to make up >>> >>> for the lack of >>> >>>> cascading deletes in databases that don't support it, but >>> >>> it doesn't do >>> >>>> it well at all, hence the problems you are seeing. After >>> >>> consulting with >>> >>>> Hilmar last week we decided it was a fair assumption to >>> >>> make that all >>> >>>> BioSQL instances are installed with cascading deletes enabled. >>>> BioPerl-db already makes this assumption. >>>> >>>> cheers, >>>> Richard >>>> >>>> Richard Holland >>>> Bioinformatics Specialist >>>> GIS extension 8199 >>>> --------------------------------------------- >>>> This email is confidential and may be privileged. If you are not the >>>> intended recipient, please delete it and notify us >>> >>> immediately. Please >>> >>>> do not copy or use it for any purpose, or disclose its >>> >>> content to any >>> >>>> other person. Thank you. >>>> --------------------------------------------- >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: biosql-l-bounces@portal.open-bio.org >>>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>>>> mark.schreiber@novartis.com >>>>> Sent: Monday, June 20, 2005 5:57 PM >>>>> To: Martina >>>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>>>> biosql-l@open-bio.org >>>>> Subject: Re: [BioSQL-l] _removeSequence >>>>> >>>>> >>>>> Biojava doesn't attempt to recusivley remove features by >>>>> itself. It relies >>>>> on cascading deletes in the database. I know Oracle can be >>>>> set to do this >>>>> (and it works very well). If MySQL has equivalent >>>>> functionality you may >>>>> need to turn it on. I'm pretty sure it does but you need to >>> >>> set it up. >>> >>>>> >>>>> - Mark >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Martina >>>>> Sent by: biosql-l-bounces@portal.open-bio.org >>>>> 06/20/2005 05:43 PM >>>>> >>>>> >>>>> To: biosql-l@open-bio.org, BioJava >>> >>> >>> >>>>> cc: (bcc: Mark Schreiber/GP/Novartis) >>>>> Subject: [BioSQL-l] _removeSequence >>>>> >>>>> >>>>> Hi, >>>>> >>>>> Im trying to delete a sequence and recursivly all its features. >>>>> >>>>> So: >>>>> >>>>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >>>>> Sequence s = si.nextSequence(); >>>>> String name = s.getName(); >>>>> s = null; >>>>> db.removeSequence(name); >>>>> } >>>>> >>>>> But if I look in the database (MySQL 4.1.12) I can still >>> >>> see plenty >>> >>>>> of entries and I have problems entering the same features again, >>>>> because of dublicate key error. I would like to know if >>>>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove >>>>> features recursivly or just the features of the removed sequence? >>>>> If so - what is the best way do delete the features of the features >>>>> (and so on)? And how to empty the db completly? >>>>> >>>>> Martina >>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l@open-bio.org >>>>> http://open-bio.org/mailman/listinfo/biosql-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l@open-bio.org >>>>> http://open-bio.org/mailman/listinfo/biosql-l >>>> >>>> >>>> >>> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> From boehme at mpiib-berlin.mpg.de Tue Jun 21 06:10:16 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Tue Jun 21 06:02:43 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> Message-ID: <42B7E788.3040205@mpiib-berlin.mpg.de> > Yes. When you insert a sequence you must be prepared that when inserting > its ontology term or tag/value annotation the term may already be > present because another bioentry uses it too. Ok, the proper way is to catch the SQLException in BIOSQLFeature, test if it is a Dublicate key entry, get the identifier of the term (would that be the BioSQLfeatureId ?) and insert it in the term_relationship table? And there is no nice BioJava method for this, I have to do it "manually", like conn.prepareStatement(..) and stuff? BioJava spoiled me so! Martina From hlapp at gnf.org Tue Jun 21 06:17:42 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 21 06:10:43 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <42B7E1EE.5090505@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> <78e39420822012ffbf691b5edc233b4a@gnf.org> <42B7E1EE.5090505@mpiib-berlin.mpg.de> Message-ID: On Jun 21, 2005, at 2:46 AM, Martina wrote: > Hi Hilmar, > > I wasn't aware of 2 different types of features. > I'm making features as described in > http://www.biojava.org/docs/bj_in_anger/feature.htm, and as far as I > can tell from the results, its the first type you describe. No this is not different types of features; it's only whether the features are nested or not. > The second type of feature is confusing me: as I understood the > feature relationships, the graph is a tree, with only one parent for a > given feature I'm not sure whether Biojava imposes this as a limitation, but Biosql certainly doesn't since it assumes a n:n relationship. In reality, nested features compliant with SO/SOFA will be trees though, I believe. > , and if that feature is deleted, all its children should get deleted > too? No, as I said below. To be more precise, not by the mechanism of cascading deletes (remember: cascading deletes only follow foreign key constraints - and a feature doesn't have a foreign key to another one). Your software or Biojava may implement it the way you suggested, but no RDBMS is going to do this for you. -hilmar > > Martina > > > Hilmar Lapp wrote: > >> There's one thing that I'm unsure about in Martina's original email, >> namely whether she was referring to features related to a sequence >> (bioentry), or to features hierarchically related to each other >> through the seqfeature_relationship table. >> If the former, then the cascading delete should have taken care of >> removing the features when you remove the sequence (bioentry) to >> which they point through their foreign key (and recursively the >> locations etc). >> However, if the question was about hierarchical features, then >> deleting one feature in the hierarchy will never (and shouldn't ever) >> delete any other feature in the hierarchy (except if all of them >> reference the same bioentry and you deleted the bioentry). If you >> delete a seqfeature in a hierarchy of seqfeatures then by cascading >> delete this will also delete all rows in seqfeature_relationship that >> reference that seqfeature as either a subject or an object in a >> nesting relationship between features. I.e., looking at the hierarchy >> as a graph, removing a node will cascade to deleting all incoming and >> outgoing arcs for that node, but not other nodes. >> If your application wants to take down all nodes in the hierarchy >> when one node is deleted, you need to write code to do this. (Except >> if, as mentioned before, all features reference the same bioentry, in >> which case deleting the bioentry will delete the entire feature >> hierarchy.) >> -hilmar >> On Jun 20, 2005, at 3:33 AM, Richard HOLLAND wrote: >>> Well, technically that should work because BioJava simply issues a >>> delete against the seqfeature table, and therefore all features >>> related >>> through foreign keys should automatically delete themselves as a >>> result >>> without any further intervention by BioJava... beats me why it >>> doesn't! >>> Unfortunately I don't currently use the MySQL implementation myself >>> so I >>> can't help much. I hope someone on BioSQL-L knows a little more? >>> >>> Richard Holland >>> Bioinformatics Specialist >>> GIS extension 8199 >>> --------------------------------------------- >>> This email is confidential and may be privileged. If you are not the >>> intended recipient, please delete it and notify us immediately. >>> Please >>> do not copy or use it for any purpose, or disclose its content to any >>> other person. Thank you. >>> --------------------------------------------- >>> >>> >>>> -----Original Message----- >>>> From: Martina [mailto:boehme@mpiib-berlin.mpg.de] >>>> Sent: Monday, June 20, 2005 6:21 PM >>>> To: Richard HOLLAND >>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>>> biosql-l@open-bio.org >>>> Subject: Re: [BioSQL-l] _removeSequence >>>> >>>> >>>> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v >>>> 1.40 >>>> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. >>>> Do I need to do anything else? >>>> >>>> Thanks, >>>> Martina >>>> >>>> Richard HOLLAND wrote: >>>> >>>>> To do cascading deletes in MySQL requires the tables to >>>> >>>> have been set up >>>> >>>>> using the InnoDB table style (as opposed to the default >>>> >>>> MyISAM tables). >>>> >>>>> In InnoDB, foreign keys are actually enforced and deletes >>>> >>>> will cascade, >>>> >>>>> whereas in MyISAM it has no concept of foreign keys and so >>>> >>>> is unable to >>>> >>>>> enforce data integrity. The people on the BioSQL-L mailing >>>> >>>> list will be >>>> >>>>> able to help you there. >>>>> >>>>> The next version of BioJava's database interfaces after the >>>> >>>> 1.4 release >>>> >>>>> will assume that the underlying database does have cascading >>>>> deletes >>>>> turned on. The existing version half-attempts to make up >>>> >>>> for the lack of >>>> >>>>> cascading deletes in databases that don't support it, but >>>> >>>> it doesn't do >>>> >>>>> it well at all, hence the problems you are seeing. After >>>> >>>> consulting with >>>> >>>>> Hilmar last week we decided it was a fair assumption to >>>> >>>> make that all >>>> >>>>> BioSQL instances are installed with cascading deletes enabled. >>>>> BioPerl-db already makes this assumption. >>>>> >>>>> cheers, >>>>> Richard >>>>> >>>>> Richard Holland >>>>> Bioinformatics Specialist >>>>> GIS extension 8199 >>>>> --------------------------------------------- >>>>> This email is confidential and may be privileged. If you are not >>>>> the >>>>> intended recipient, please delete it and notify us >>>> >>>> immediately. Please >>>> >>>>> do not copy or use it for any purpose, or disclose its >>>> >>>> content to any >>>> >>>>> other person. Thank you. >>>>> --------------------------------------------- >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: biosql-l-bounces@portal.open-bio.org >>>>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>>>>> mark.schreiber@novartis.com >>>>>> Sent: Monday, June 20, 2005 5:57 PM >>>>>> To: Martina >>>>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>>>>> biosql-l@open-bio.org >>>>>> Subject: Re: [BioSQL-l] _removeSequence >>>>>> >>>>>> >>>>>> Biojava doesn't attempt to recusivley remove features by >>>>>> itself. It relies >>>>>> on cascading deletes in the database. I know Oracle can be >>>>>> set to do this >>>>>> (and it works very well). If MySQL has equivalent >>>>>> functionality you may >>>>>> need to turn it on. I'm pretty sure it does but you need to >>>> >>>> set it up. >>>> >>>>>> >>>>>> - Mark >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Martina >>>>>> Sent by: biosql-l-bounces@portal.open-bio.org >>>>>> 06/20/2005 05:43 PM >>>>>> >>>>>> >>>>>> To: biosql-l@open-bio.org, BioJava >>>> >>>> >>>> >>>>>> cc: (bcc: Mark Schreiber/GP/Novartis) >>>>>> Subject: [BioSQL-l] _removeSequence >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> Im trying to delete a sequence and recursivly all its features. >>>>>> >>>>>> So: >>>>>> >>>>>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >>>>>> Sequence s = si.nextSequence(); >>>>>> String name = s.getName(); >>>>>> s = null; >>>>>> db.removeSequence(name); >>>>>> } >>>>>> >>>>>> But if I look in the database (MySQL 4.1.12) I can still >>>> >>>> see plenty >>>> >>>>>> of entries and I have problems entering the same features again, >>>>>> because of dublicate key error. I would like to know if >>>>>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove >>>>>> features recursivly or just the features of the removed sequence? >>>>>> If so - what is the best way do delete the features of the >>>>>> features >>>>>> (and so on)? And how to empty the db completly? >>>>>> >>>>>> Martina >>>>>> >>>>>> _______________________________________________ >>>>>> BioSQL-l mailing list >>>>>> BioSQL-l@open-bio.org >>>>>> http://open-bio.org/mailman/listinfo/biosql-l >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> BioSQL-l mailing list >>>>>> BioSQL-l@open-bio.org >>>>>> http://open-bio.org/mailman/listinfo/biosql-l >>>>> >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >>> -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Jun 21 06:21:33 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 21 06:13:53 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <42B7E788.3040205@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> Message-ID: <0be3992b92f6a14b6d06d5a06549555b@gnf.org> The Biojava people will respond to this. Note though that Term_Relationship is for storing subject-predicate-object triples of terms, so I'm not sure why you want to use it for storing/associating annotation. Maybe you meant bioentry_qualifier_value? -hilmar On Jun 21, 2005, at 3:10 AM, Martina wrote: > >> Yes. When you insert a sequence you must be prepared that when >> inserting its ontology term or tag/value annotation the term may >> already be present because another bioentry uses it too. > > Ok, the proper way is to catch the SQLException in BIOSQLFeature, test > if it is a Dublicate key entry, get the identifier of the term (would > that be the BioSQLfeatureId ?) and insert it in the term_relationship > table? And there is no nice BioJava method for this, I have to do it > "manually", like conn.prepareStatement(..) and stuff? BioJava spoiled > me so! > > Martina > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jana.bauckmann at informatik.hu-berlin.de Tue Jun 21 08:15:01 2005 From: jana.bauckmann at informatik.hu-berlin.de (Jana Bauckmann) Date: Tue Jun 21 08:08:49 2005 Subject: [BioSQL-l] Re: memory error while loading SwissProt into Oracle using bioperl-db In-Reply-To: <3ba087a1f2d128f023b94d871b0366fa@gnf.org> Message-ID: Hi, I solved my problems to import SwissProt. It turned out to be a mixture of reasons -- so I thought it could be interesting for you: 1) An upgrade to BioPerl 1.5 solved my problems with integrity constraint errors. 2) I got a memory leak with DBD::Oracle, Oracle 9.2 and multi-thread enabled perl 5.8.1 -- as you assumed. (The used memory growed up to 2GB while inserting 30000 records.) I installed perl as multi-thread disabled version and everything worked fine. Thank you very much, Jana On Tue, 14 Jun 2005, Hilmar Lapp wrote: > > On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote: > > > Hi, > > > > I would like to load SwissProt data into my Oracle 9.2 database with > > BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got > > two > > problems: > > > > 1) I get many (about 1300) warnings stating integrity constraint > > errors: > > > > ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - parent > > key not found (DBD ERROR: OCIStmtExecute) > > > > ORA-01400: cannot insert NULL into > > ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS") > > (DBD ERROR: OCIStmtExecute) > > If there is indeed no authors for the respective reference in the > respective SwissProt entries then this is expected because > Reference.Authors may not be NULL. > > You should, however, see more than just the error message above; > supposedly there is a warning message following or preceding it that > informs about not all foreign keys succeeded to insert, and the message > should give the primary key. This should be the primary key for the > bioentry that should have gotten the reference attached. Using SQL you > should then be able to identify which record it is and then you can > look it up on the Swissprot site or in your Swissprot source file. > > If the bioentry itself fails to load because of this problem then you > should see an error message to this effect, with full stack trace. > Otherwise the bioentry did load, just the reference didn't, and if you > don't really need this particular reference, you don't need to worry > about it. > > You may also want to consider trying to upgrade to a CVS snapshot from > either the 1.4 branch or the main trunk. There have been a few fixes to > modules that I believe include the swissprot parser. > > > > > 2) The script stops after 2 hours (34500 tuples in table BioEntry) with > > message: Out of memory! > > > > I guess problem 1 causes problem 2. Is this reasonable or do I have two > > separated problems? > > The one before may not even be a real problem, see above. It is > extremely unlikely that it causes the memory problem. > > Swissprot is is a large, very diverse, and richly annotated data > source, and because bioperl-db caches a lot of stuff like ontology > terms, references, and dbxrefs the loader process will eventually use > up anywhere between 500MB and 1.3GB of RAM. > > Given the amount of memory you have this shouldn't be a limitation > though at all, unless maybe if you gave all the memory to Oracle > running on the same machine. > > I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client > library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be > seeing a similar problem. Try watching the loader process in top and > see how fast the memory consumption grows. It will grow due to the > object cache filling up, but if you see it eating up more than 1GB > before 100,000 records loaded you're likely to have hit a memory leak. > > If that's the case you'll have to rebuild your own perl from source > with multi-threading disabled. > > -hilmar > > > > > I run Oracle and the load script on the same machine with: > > Suse Linux 9.0 (kernel 2.4.21-291-smp) with 12 GB RAM > > perl 5.8.1, built for i586-linux-thread-multi > > bioperl 1.4 > > bioperl-db 0.1 > > BTW I'm assuming this is not correct; otherwise the latest BioSQL > schema wouldn't be supported, let alone the Oracle version of it. You > probably obtained a snapshot from CVS? > > > DBI 1.48 > > DBD::Oracle 1.16 > > Oracle 9.2 > > BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on > > 6th > > June 2005) > > > > Thanks for any suggestions, > > Jana > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l@open-bio.org > > http://open-bio.org/mailman/listinfo/biosql-l > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > From boehme at mpiib-berlin.mpg.de Tue Jun 21 09:55:15 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Tue Jun 21 09:51:10 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <0be3992b92f6a14b6d06d5a06549555b@gnf.org> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> Message-ID: <42B81C43.9010404@mpiib-berlin.mpg.de> That means, that I can't have 2 features refering to the same bioentry with the same type (= type_term_id)and source (=source_term_id) but different parent features because of the composite key bioentry_id in the seqfeature table? Or what does "rank" in that table mean (its part of that key), how can I get different ranks? Martina Hilmar Lapp wrote: > The Biojava people will respond to this. Note though that > Term_Relationship is for storing subject-predicate-object triples of > terms, so I'm not sure why you want to use it for storing/associating > annotation. Maybe you meant bioentry_qualifier_value? > > -hilmar > > On Jun 21, 2005, at 3:10 AM, Martina wrote: > >> >>> Yes. When you insert a sequence you must be prepared that when >>> inserting its ontology term or tag/value annotation the term may >>> already be present because another bioentry uses it too. >> >> >> Ok, the proper way is to catch the SQLException in BIOSQLFeature, test >> if it is a Dublicate key entry, get the identifier of the term (would >> that be the BioSQLfeatureId ?) and insert it in the term_relationship >> table? And there is no nice BioJava method for this, I have to do it >> "manually", like conn.prepareStatement(..) and stuff? BioJava spoiled >> me so! >> >> Martina >> From hlapp at gnf.org Tue Jun 21 14:32:47 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 21 14:22:40 2005 Subject: [BioSQL-l] Re: memory error while loading SwissProt into Oracle using bioperl-db In-Reply-To: References: Message-ID: Good to know that the memory leak is not constrained to MacOSX. BTW aside from using a multi-threading disabled perl, I could also get rid of the memory leak by using the Instant Client from Oracle (which is 10g, but will connect fine to a 9i database). Again, that's on MacOSX but chances are it will have the same effect for you. -hilmar On Jun 21, 2005, at 5:15 AM, Jana Bauckmann wrote: > Hi, > > I solved my problems to import SwissProt. It turned out to be a > mixture of > reasons -- so I thought it could be interesting for you: > > 1) An upgrade to BioPerl 1.5 solved my problems with integrity > constraint > errors. > > 2) I got a memory leak with DBD::Oracle, Oracle 9.2 and multi-thread > enabled perl 5.8.1 -- as you assumed. (The used memory growed up to 2GB > while inserting 30000 records.) I installed perl as multi-thread > disabled > version and everything worked fine. > > Thank you very much, > Jana > > > On Tue, 14 Jun 2005, Hilmar Lapp wrote: > >> >> On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote: >> >>> Hi, >>> >>> I would like to load SwissProt data into my Oracle 9.2 database with >>> BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got >>> two >>> problems: >>> >>> 1) I get many (about 1300) warnings stating integrity constraint >>> errors: >>> >>> ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - >>> parent >>> key not found (DBD ERROR: OCIStmtExecute) >>> >>> ORA-01400: cannot insert NULL into >>> ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS") >>> (DBD ERROR: OCIStmtExecute) >> >> If there is indeed no authors for the respective reference in the >> respective SwissProt entries then this is expected because >> Reference.Authors may not be NULL. >> >> You should, however, see more than just the error message above; >> supposedly there is a warning message following or preceding it that >> informs about not all foreign keys succeeded to insert, and the >> message >> should give the primary key. This should be the primary key for the >> bioentry that should have gotten the reference attached. Using SQL you >> should then be able to identify which record it is and then you can >> look it up on the Swissprot site or in your Swissprot source file. >> >> If the bioentry itself fails to load because of this problem then you >> should see an error message to this effect, with full stack trace. >> Otherwise the bioentry did load, just the reference didn't, and if you >> don't really need this particular reference, you don't need to worry >> about it. >> >> You may also want to consider trying to upgrade to a CVS snapshot from >> either the 1.4 branch or the main trunk. There have been a few fixes >> to >> modules that I believe include the swissprot parser. >> >>> >>> 2) The script stops after 2 hours (34500 tuples in table BioEntry) >>> with >>> message: Out of memory! >>> >>> I guess problem 1 causes problem 2. Is this reasonable or do I have >>> two >>> separated problems? >> >> The one before may not even be a real problem, see above. It is >> extremely unlikely that it causes the memory problem. >> >> Swissprot is is a large, very diverse, and richly annotated data >> source, and because bioperl-db caches a lot of stuff like ontology >> terms, references, and dbxrefs the loader process will eventually use >> up anywhere between 500MB and 1.3GB of RAM. >> >> Given the amount of memory you have this shouldn't be a limitation >> though at all, unless maybe if you gave all the memory to Oracle >> running on the same machine. >> >> I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client >> library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be >> seeing a similar problem. Try watching the loader process in top and >> see how fast the memory consumption grows. It will grow due to the >> object cache filling up, but if you see it eating up more than 1GB >> before 100,000 records loaded you're likely to have hit a memory leak. >> >> If that's the case you'll have to rebuild your own perl from source >> with multi-threading disabled. >> >> -hilmar >> >>> >>> I run Oracle and the load script on the same machine with: >>> Suse Linux 9.0 (kernel 2.4.21-291-smp) with 12 GB RAM >>> perl 5.8.1, built for i586-linux-thread-multi >>> bioperl 1.4 >>> bioperl-db 0.1 >> >> BTW I'm assuming this is not correct; otherwise the latest BioSQL >> schema wouldn't be supported, let alone the Oracle version of it. You >> probably obtained a snapshot from CVS? >> >>> DBI 1.48 >>> DBD::Oracle 1.16 >>> Oracle 9.2 >>> BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on >>> 6th >>> June 2005) >>> >>> Thanks for any suggestions, >>> Jana >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l@open-bio.org >>> http://open-bio.org/mailman/listinfo/biosql-l >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Jun 21 15:47:49 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 21 15:37:45 2005 Subject: [BioSQL-l] _removeSequence In-Reply-To: <42B7EC3C.60100@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B7EC3C.60100@mpiib-berlin.mpg.de> Message-ID: <69b3e884d800350b04e714de631e4d26@gnf.org> As for documentation of the schema, there is an ERD (in PDF format) and a schema-overview.txt in the biosql-schema/doc directory. I'll also be adding a version of the Biojava-in-anger document revised by Richard Holland, but that only deals with installation so won't help you much once you're beyond that point. As to what goes where with respect to which part of the entries in which datasource goes to which tables in the biosql schema, that's a more involved question because data sources are different already, and because Biojava and Bioperl do things mostly different and incompatible right now, and because the exact mapping is not written down somewhere explicitly but more or less implicit from the way the bioperl SeqIO parsers work and how the bioperl RichSeq object (which is the object returned by most bioperl parsers) stores attributes as annotation. Richard, Mark, and I discussed this in Singapore a week ago and how the situation can be improved. -hilmar On Jun 21, 2005, at 3:30 AM, Martina wrote: > Well - I'm not familiar with the BioSQL structure because BioJava did > it all for me. But if I have to, I'll look into it. The best > documentation are the comments in the *.sql file? Or how do I find out > where things go into? > > Martina > > Hilmar Lapp wrote: > >> Note though that Term_Relationship is for storing >> subject-predicate-object triples of terms, so I'm not sure why you >> want to use it for storing/associating annotation. Maybe you meant >> bioentry_qualifier_value? > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From boehme at mpiib-berlin.mpg.de Wed Jun 22 05:24:08 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Wed Jun 22 05:16:23 2005 Subject: [BioSQL-l] update seqfeature In-Reply-To: <42B83D31.2000403@nrc-cnrc.gc.ca> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> <42B83D31.2000403@nrc-cnrc.gc.ca> Message-ID: <42B92E38.2020008@mpiib-berlin.mpg.de> Hi Simon, I'm changing the FeatureSource and in setFeatureSource an update on the source_term_id happens. In the case the combination is already there, I get an Exception. The proper way to deal with that would be to get the seqfeature_id of the entry already there and use that, or try to update the rank unless its a unique combination? Or should I rather not mess with the BioJava and delete that entry and insert it as new to let BioJava handle the rank increase? Thanks for any advise Martina Simon Foote wrote: > Hi Martina, > > In fact you can, as rank is the field that allows this to happen. In > Biojava, currently it's just a linearily incremented number such that > you can have the same type and source IDs for a given bioentry. > > For example, adding a Genbank entry with 10 CDS features for 1 bioentry > will give you identical keys for bioentry_id, type_term_id and > source_term_id, but will have a rank of 1 - 10 for each. > > Simon > From boehme at mpiib-berlin.mpg.de Wed Jun 22 09:05:44 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Wed Jun 22 08:57:26 2005 Subject: [BioSQL-l] Re: update seqfeature In-Reply-To: <42B95EBF.7050403@nrc-cnrc.gc.ca> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> <42B83D31.2000403@nrc-cnrc.gc.ca> <42B92E38.2020008@mpiib-berlin.mpg.de> <42B95EBF.7050403@nrc-cnrc.gc.ca> Message-ID: <42B96228.4020100@mpiib-berlin.mpg.de> Hi Simon, sorry, I might haven't made that clear enough: The problem only exists with changing a feature source (or type, but I didn't try that) because of the composite unique index in biosql seqfeature table, it doesn't check if the location is the same or not, but the combination of type, source, bioentry id and rank has to be unique. So if I insert a new feature, the rank gets increased by BioJava somehow and all is well, but if I update an existing features source and hit by accident the same combination as anothers fetures type, source, .. I get the exception and the source doesn't change. At least that is what I suppose is happening. My question was how to handle this situation? Martina Simon Foote wrote: > Hi Martina, > > Biojava should handle that correctly. I haven't done it by changing a > feature source, but I have with changing a feature's location and > strand. For changing a location: > > // Get the Feature you wish to edit > StrandedFeature sf = ex. use a feature filter to grab the feature by > it's ID > Location loc = new Location(100, 1100); > sf.setLocation(loc); > > Since you have already retrieved the feature to edit, biojava will > automatically do this as an update and not an insert. Or it should in > all cases where you are modifying a pre-existing feature. > From reneehalbrook74 at yahoo.com Wed Jun 22 11:24:13 2005 From: reneehalbrook74 at yahoo.com (Renee Halbrook) Date: Wed Jun 22 11:15:38 2005 Subject: [BioSQL-l] very new to biosql--sequence loading question Message-ID: <20050622152413.27457.qmail@web40506.mail.yahoo.com> Hi, I am very new to biosql. I have designed a mysql schema to represent cyanobacteria, pulled from genbank files. It is not identical to the biosql schema, but it is similar. My specific issue is in loading large sequences into a sequence table, (essentially identical to the biosequence table) using perl dbi. I keep running into a 'max_allowed_packet' issue, even though I have bumped it up to a 1 gig in the my.cnf file. I would like to see how other people have implemented this. Could someone please point me in the direction of the documentation for loading sequences using perl, from flat genbank files, into a mysql database ? Thanks in advance for any help. Regards, Renee ____________________________________________________ Yahoo! Sports Rekindle the Rivalries. Sign up for Fantasy Football http://football.fantasysports.yahoo.com From simon.foote at nrc-cnrc.gc.ca Tue Jun 21 08:47:08 2005 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Wed Jun 22 13:06:35 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: <42B6DEC3.9090807@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> Message-ID: <42B80C4C.7060204@nrc-cnrc.gc.ca> Hi Martina, That would be correct as the on delete cascade doesn't touch the term tables as they are always referenced by any sequence. There aren't any foreign key constraints put on those 4 tables, hence they don't get deleted. Simon Martina wrote: > Hi, > > so I have this new database (still biosqldb-mysql.sqlv 1.40 2004/11/04 > 01:49:41) and after removing all sequences, I do still have entries in > term, term_relationship,term_relationship_term and ontology. And of > course, in biodatabase. If I delete the entry in biodatabase too, > nothing changes. Is that what is to be expected? > Cause I still have trouble with the dublicate entry key, but that must > be my code then. > > Thanks > Martina > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca From simon.foote at nrc-cnrc.gc.ca Tue Jun 21 12:15:45 2005 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Wed Jun 22 13:06:42 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: <42B81C43.9010404@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> Message-ID: <42B83D31.2000403@nrc-cnrc.gc.ca> Hi Martina, In fact you can, as rank is the field that allows this to happen. In Biojava, currently it's just a linearily incremented number such that you can have the same type and source IDs for a given bioentry. For example, adding a Genbank entry with 10 CDS features for 1 bioentry will give you identical keys for bioentry_id, type_term_id and source_term_id, but will have a rank of 1 - 10 for each. Simon Martina wrote: > That means, that I can't have 2 features refering to the same bioentry > with the same type (= type_term_id)and source (=source_term_id) but > different parent features because of the composite key bioentry_id in > the seqfeature table? Or what does "rank" in that table mean (its part > of that key), how can I get different ranks? > > Martina > > Hilmar Lapp wrote: > >> The Biojava people will respond to this. Note though that >> Term_Relationship is for storing subject-predicate-object triples of >> terms, so I'm not sure why you want to use it for storing/associating >> annotation. Maybe you meant bioentry_qualifier_value? >> >> -hilmar >> >> On Jun 21, 2005, at 3:10 AM, Martina wrote: >> >>> >>>> Yes. When you insert a sequence you must be prepared that when >>>> inserting its ontology term or tag/value annotation the term may >>>> already be present because another bioentry uses it too. >>> >>> >>> >>> Ok, the proper way is to catch the SQLException in BIOSQLFeature, >>> test if it is a Dublicate key entry, get the identifier of the term >>> (would that be the BioSQLfeatureId ?) and insert it in the >>> term_relationship table? And there is no nice BioJava method for >>> this, I have to do it "manually", like conn.prepareStatement(..) and >>> stuff? BioJava spoiled me so! >>> >>> Martina >>> > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca From simon.foote at nrc-cnrc.gc.ca Wed Jun 22 08:51:11 2005 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Wed Jun 22 13:06:43 2005 Subject: [BioSQL-l] Re: update seqfeature In-Reply-To: <42B92E38.2020008@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> <42B83D31.2000403@nrc-cnrc.gc.ca> <42B92E38.2020008@mpiib-berlin.mpg.de> Message-ID: <42B95EBF.7050403@nrc-cnrc.gc.ca> Hi Martina, Biojava should handle that correctly. I haven't done it by changing a feature source, but I have with changing a feature's location and strand. For changing a location: // Get the Feature you wish to edit StrandedFeature sf = ex. use a feature filter to grab the feature by it's ID Location loc = new Location(100, 1100); sf.setLocation(loc); Since you have already retrieved the feature to edit, biojava will automatically do this as an update and not an insert. Or it should in all cases where you are modifying a pre-existing feature. Simon Martina wrote: > Hi Simon, > > I'm changing the FeatureSource and in setFeatureSource an update on > the source_term_id happens. In the case the combination is already > there, I get an Exception. The proper way to deal with that would be > to get the seqfeature_id of the entry already there and use that, or > try to update the rank unless its a unique combination? Or should I > rather not mess with the BioJava and delete that entry and insert it > as new to let BioJava handle the rank increase? > > Thanks for any advise > > Martina > > Simon Foote wrote: > >> Hi Martina, >> >> In fact you can, as rank is the field that allows this to happen. In >> Biojava, currently it's just a linearily incremented number such that >> you can have the same type and source IDs for a given bioentry. >> >> For example, adding a Genbank entry with 10 CDS features for 1 >> bioentry will give you identical keys for bioentry_id, type_term_id >> and source_term_id, but will have a rank of 1 - 10 for each. >> >> Simon >> -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca From simon.foote at nrc-cnrc.gc.ca Wed Jun 22 09:15:54 2005 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Wed Jun 22 13:06:45 2005 Subject: [BioSQL-l] Re: update seqfeature In-Reply-To: <42B96228.4020100@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> <42B83D31.2000403@nrc-cnrc.gc.ca> <42B92E38.2020008@mpiib-berlin.mpg.de> <42B95EBF.7050403@nrc-cnrc.gc.ca> <42B96228.4020100@mpiib-berlin.mpg.de> Message-ID: <42B9648A.5040001@nrc-cnrc.gc.ca> I get the problem now, that would then be a bug in biojava. It should do an internal check to see if a source/type term change will cause a non-unique exception and if so, then also update the rank to the next available one. One solution would be to catch the exception then do a select for the max(rank) for the given bioentry_id, source_term_id, type_term_id and then increment it by one. In fact, it would probably be wise to always update the rank when changing either the source or type term, so that the ranks stay incrementally consistent, if that really matters. Simon Martina wrote: > Hi Simon, > > sorry, I might haven't made that clear enough: > The problem only exists with changing a feature source (or type, but I > didn't try that) because of the composite unique index in biosql > seqfeature table, it doesn't check if the location is the same or not, > but the combination of type, source, bioentry id and rank has to be > unique. So if I insert a new feature, the rank gets increased by > BioJava somehow and all is well, but if I update an existing features > source and hit by accident the same combination as anothers fetures > type, source, .. I get the exception and the source doesn't change. > At least that is what I suppose is happening. > > My question was how to handle this situation? > > Martina > > > Simon Foote wrote: > >> Hi Martina, >> >> Biojava should handle that correctly. I haven't done it by changing >> a feature source, but I have with changing a feature's location and >> strand. For changing a location: >> >> // Get the Feature you wish to edit >> StrandedFeature sf = ex. use a feature filter to grab the feature by >> it's ID >> Location loc = new Location(100, 1100); >> sf.setLocation(loc); >> >> Since you have already retrieved the feature to edit, biojava will >> automatically do this as an update and not an insert. Or it should >> in all cases where you are modifying a pre-existing feature. >> -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca From reneehalbrook74 at yahoo.com Thu Jun 23 14:00:02 2005 From: reneehalbrook74 at yahoo.com (Renee Halbrook) Date: Thu Jun 23 13:51:28 2005 Subject: [BioSQL-l] load_taxonomy.pl question Message-ID: <20050623180004.70399.qmail@web40511.mail.yahoo.com> Hi, Is it possible alter the load_taxonomy.pl script to load data for only a certain subtree? For example ,to grab the taxonomy structure starting with CyanoBacteria (id =1117) as the root ? Thanks for any feedback, Renee ____________________________________________________ Yahoo! Sports Rekindle the Rivalries. Sign up for Fantasy Football http://football.fantasysports.yahoo.com From hlapp at gnf.org Thu Jun 23 22:16:06 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Thu Jun 23 22:08:17 2005 Subject: [BioSQL-l] load_taxonomy.pl question In-Reply-To: <20050623180004.70399.qmail@web40511.mail.yahoo.com> References: <20050623180004.70399.qmail@web40511.mail.yahoo.com> Message-ID: I guess there could be a way, but it's got to be very complicated, because now you're trying to do something in perl for which perl's not made. Why not just load up everything? It's not that much of diskspace. Also, if you're really eager to keep only the subtree, you could delete the rest using SQL. -hilmar On Jun 23, 2005, at 2:00 PM, Renee Halbrook wrote: > Hi, > Is it possible alter the load_taxonomy.pl script to > load data for only a certain subtree? For example ,to > grab the taxonomy structure starting with > CyanoBacteria (id =1117) as the root ? > > Thanks for any feedback, > Renee > > > > ____________________________________________________ > Yahoo! Sports > Rekindle the Rivalries. Sign up for Fantasy Football > http://football.fantasysports.yahoo.com > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From amackey at pcbi.upenn.edu Fri Jun 24 09:09:17 2005 From: amackey at pcbi.upenn.edu (Aaron J. Mackey) Date: Fri Jun 24 09:00:59 2005 Subject: [BioSQL-l] load_taxonomy.pl question In-Reply-To: References: <20050623180004.70399.qmail@web40511.mail.yahoo.com> Message-ID: <6E07CCDD-5C11-488A-A757-9E5D1210B70C@pcbi.upenn.edu> I agree that it would be mildly complicated, but it's not Perl's fault at all. It would be mildly complicated in any language. The complication stems from the fact that the data we load from is a tab- delimited flat file of "taxon parent-taxon" tuples, so as we load we cannot know (without some additional upfront work) whether any given row is desirable. If we could know that, the solution would be trivial. One way to know that is to basically read the whole file into a memory-representation of the tree (only keeping node id's for memory conservation), and then only keep the desired subtree (purge the rest); then, as we process the input files, only act on those lines that apply to members of the subtree (probably flattened to a hash to make lookup quicker). No big deal really, and something Perl can do just as well as any other programming language. I leave the implementation as an exercise for the reader, however, as I agree that deleting everything but the desired subtree via SQL would also work nicely, though not save any processing time ;) -Aaron On Jun 23, 2005, at 10:16 PM, Hilmar Lapp wrote: > I guess there could be a way, but it's got to be very complicated, > because now you're trying to do something in perl for which perl's > not made. > > Why not just load up everything? It's not that much of diskspace. > Also, if you're really eager to keep only the subtree, you could > delete the rest using SQL. > > -hilmar > > On Jun 23, 2005, at 2:00 PM, Renee Halbrook wrote: > > >> Hi, >> Is it possible alter the load_taxonomy.pl script to >> load data for only a certain subtree? For example ,to >> grab the taxonomy structure starting with >> CyanoBacteria (id =1117) as the root ? >> >> Thanks for any feedback, >> Renee >> >> >> >> ____________________________________________________ >> Yahoo! Sports >> Rekindle the Rivalries. Sign up for Fantasy Football >> http://football.fantasysports.yahoo.com >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: amackey@pcbi.upenn.edu office: 215-898-1205 fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 From hollandr at gis.a-star.edu.sg Sun Jun 26 11:06:40 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jun 26 10:59:26 2005 Subject: [BioSQL-l] update seqfeature Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562B5@BIONIC.biopolis.one-north.com> Actually, BioJava is not that clever. Yet. Martina's original observation is right, in that the correct way to do this would be to check the database to see if the altered seqfeature already existed, and if it did, to refer to that one instead. But this is not the way BioJava does things at present. A fix for this will probably end up being built in to the replacement BioJava/BioSQL classes currently in progress, but for now, to delete/create the feature is probably the best workaround. cheers, Richard -----Original Message----- From: biosql-l-bounces@portal.open-bio.org on behalf of Martina Sent: Wed 6/22/2005 5:24 PM To: simon.foote@nrc-cnrc.gc.ca Cc: biosql-l-bounces@portal.open-bio.org; BioJava; biosql-l@open-bio.org Subject: [BioSQL-l] update seqfeature Hi Simon, I'm changing the FeatureSource and in setFeatureSource an update on the source_term_id happens. In the case the combination is already there, I get an Exception. The proper way to deal with that would be to get the seqfeature_id of the entry already there and use that, or try to update the rank unless its a unique combination? Or should I rather not mess with the BioJava and delete that entry and insert it as new to let BioJava handle the rank increase? Thanks for any advise Martina Simon Foote wrote: > Hi Martina, > > In fact you can, as rank is the field that allows this to happen. In > Biojava, currently it's just a linearily incremented number such that > you can have the same type and source IDs for a given bioentry. > > For example, adding a Genbank entry with 10 CDS features for 1 bioentry > will give you identical keys for bioentry_id, type_term_id and > source_term_id, but will have a rank of 1 - 10 for each. > > Simon > _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From hollandr at gis.a-star.edu.sg Sun Jun 26 11:11:30 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jun 26 11:04:12 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562B6@BIONIC.biopolis.one-north.com> The revamped BioJava/BioSQL classes will expose the rank to the user for all tables which have ranks. cheers, Richard -----Original Message----- From: biosql-l-bounces@portal.open-bio.org on behalf of Simon Foote Sent: Wed 6/22/2005 12:15 AM To: Martina Cc: Hilmar Lapp; biosql-l-bounces@portal.open-bio.org; BioJava; biosql-l@open-bio.org Subject: Re: [Biojava-l] Re: [BioSQL-l] _removeSequence Hi Martina, In fact you can, as rank is the field that allows this to happen. In Biojava, currently it's just a linearily incremented number such that you can have the same type and source IDs for a given bioentry. For example, adding a Genbank entry with 10 CDS features for 1 bioentry will give you identical keys for bioentry_id, type_term_id and source_term_id, but will have a rank of 1 - 10 for each. Simon Martina wrote: > That means, that I can't have 2 features refering to the same bioentry > with the same type (= type_term_id)and source (=source_term_id) but > different parent features because of the composite key bioentry_id in > the seqfeature table? Or what does "rank" in that table mean (its part > of that key), how can I get different ranks? > > Martina > > Hilmar Lapp wrote: > >> The Biojava people will respond to this. Note though that >> Term_Relationship is for storing subject-predicate-object triples of >> terms, so I'm not sure why you want to use it for storing/associating >> annotation. Maybe you meant bioentry_qualifier_value? >> >> -hilmar >> >> On Jun 21, 2005, at 3:10 AM, Martina wrote: >> >>> >>>> Yes. When you insert a sequence you must be prepared that when >>>> inserting its ontology term or tag/value annotation the term may >>>> already be present because another bioentry uses it too. >>> >>> >>> >>> Ok, the proper way is to catch the SQLException in BIOSQLFeature, >>> test if it is a Dublicate key entry, get the identifier of the term >>> (would that be the BioSQLfeatureId ?) and insert it in the >>> term_relationship table? And there is no nice BioJava method for >>> this, I have to do it "manually", like conn.prepareStatement(..) and >>> stuff? BioJava spoiled me so! >>> >>> Martina >>> > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From hollandr at gis.a-star.edu.sg Sun Jun 26 11:16:46 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jun 26 11:09:11 2005 Subject: [BioSQL-l] very new to biosql--sequence loading question Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562B8@BIONIC.biopolis.one-north.com> Hi, I can't answer the MySQL error question as I don't know anything about it, but I'm curious as to the differences between your db and BioSQL. What was it that BioSQL could not do that you had to reimplement in a different way? Maybe some suggestions could be made for improvements to BioSQL? Or maybe BioSQL can actually help with the problem but in a way that wasn't immediately obvious? cheers, Richard -----Original Message----- From: biosql-l-bounces@portal.open-bio.org on behalf of Renee Halbrook Sent: Wed 6/22/2005 11:24 PM To: biosql-l@open-bio.org Cc: Subject: [BioSQL-l] very new to biosql--sequence loading question Hi, I am very new to biosql. I have designed a mysql schema to represent cyanobacteria, pulled from genbank files. It is not identical to the biosql schema, but it is similar. My specific issue is in loading large sequences into a sequence table, (essentially identical to the biosequence table) using perl dbi. I keep running into a 'max_allowed_packet' issue, even though I have bumped it up to a 1 gig in the my.cnf file. I would like to see how other people have implemented this. Could someone please point me in the direction of the documentation for loading sequences using perl, from flat genbank files, into a mysql database ? Thanks in advance for any help. Regards, Renee ____________________________________________________ Yahoo! Sports Rekindle the Rivalries. Sign up for Fantasy Football http://football.fantasysports.yahoo.com _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From astew at wam.umd.edu Mon Jun 27 16:03:38 2005 From: astew at wam.umd.edu (Andrew Stewart) Date: Mon Jun 27 16:10:23 2005 Subject: [BioSQL-l] Strain support? Message-ID: <42C05B9A.1000807@wam.umd.edu> I don't see any strain support in the taxonomy for BioSQL. Am I mistaken, or has anyone developed support for this, or is there a plan to in the future? -Andrew Stewart BDRD From hlapp at gnf.org Tue Jun 28 10:53:32 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 28 10:44:51 2005 Subject: [BioSQL-l] Strain support? In-Reply-To: <42C05B9A.1000807@wam.umd.edu> References: <42C05B9A.1000807@wam.umd.edu> Message-ID: <39b80934b7196c6dc16971036c5b1fd9@gnf.org> BioSQL supports as much strain information as the NCBI taxonomy database download supports as the two tables mirror the NCBI taxonomy tables, and the recommendation is to populate them in advance with the NCBI taxonomy downloaded tables so that species get properly resolved by NCBI taxon ID (which will distinguish strains). What in particular did you find unsupported? -hilmar On Jun 27, 2005, at 4:03 PM, Andrew Stewart wrote: > I don't see any strain support in the taxonomy for BioSQL. > > Am I mistaken, or has anyone developed support for this, or is there a > plan to in the future? > > > -Andrew Stewart > BDRD > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From Teemu.Kivioja at vtt.fi Tue Jun 28 11:19:44 2005 From: Teemu.Kivioja at vtt.fi (Teemu Kivioja) Date: Tue Jun 28 11:10:57 2005 Subject: [BioSQL-l] Loading long strings to Oracle Message-ID: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi> Hi, I have couple of possibly related problems with loading to the Oracle database. 1.) When trying to load yeast proteins from SGD, I get: perl load_seqdatabase.pl --host sboracle1.ad.vtt.fi --driver Oracle --testonly --dbname BfxDB --format swiss --printerror test.swiss Loading test.swiss ... DBD::Oracle::st execute failed: ORA-01461: can bind a LONG value only for insert into a LONG column (DBD ERROR: OCIStmtExecute) [for statement ``UPDATE biosequence SET version = NVL(?,version), length = NVL(?,length), alphabet = NVL(?,alphabet), crc = NVL(?,crc), seq = NVL(?,seq), ent_oid = NVL(?,ent_oid) WHERE ent_oid = ?'' with params: :p5='MAKQRQTTKSSKRYRYSSFKARIDDLKIEPARNLEKRVHDYVESSHFLASFDQWKEINLSAKFTEFAAEIEHDVQTLPQILYHDKKIFNSLVSFINFHDEFSLQPLLDLLAQFCHDLGPDFLKFYEEAIKTLINLLDAAIEFESSNVFEWGFNCLAYIFKYLSKFLVKKLVLTCDLLIPLLSHSKEYLSRFSAEALSFLVRKCPVSNLREFVRSVFEKLEGDDEQTNLYEGLLILFTESMTSTQETLHSKAKAIMSVLLHEALTKSSPERSVSLLSDIWMNISKYASIESLLPVYEVMYQDFNDSLDATNIDRILKVLTTIVFSESGRKIPDWNKITILIERIMSQSENCASLSQDKVAFLFALFIRNSDVKTLTLFHQKLFNYALTNISDCFLE...', :p3='protein', :p6='14404', :p1=undef, :p7='14404', :p4='F6ED4E3E9AE0F468', :p2=2493]) at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BaseDriver.pm line 1115, line 51. The file test.swiss only includes the record: ID YBL004W STANDARD; PRT; 2494 AA. It seems that I can get rid of this error message by explicitly telling that the type of sequence field is CLOB by adding the code if ($slots[$i] eq "seq") { $self->bind_param($sth, $j, $slotvals->[$i], { ora_type => ORA_CLOB }); } else { $self->bind_param($sth, $j, $slotvals->[$i]); } 2.) When trying insert InterPro (interpro 10.0 ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get: perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi --namespace interpro --driver Oracle --dbname BfxDB --testonly --fmtargs "ontology_engine,simple" interpro.xml ... 11900 Loading ontology InterPro: ... terms -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("IPR000911","Ribosomal protein L11","Ribosomes are ... ORA-01461: can bind a LONG value only for insert into a LONG column (DBD ERROR: error possibly near <*> indicator at char 14 in 'INSERT INTO te<*>rm (identifier, name, definition, is_obsolete, ont_oid) VALUES (:p1, :p2, :p3, :p4, :p5)') --------------------------------------------------- Could not store term IPR000911, name 'Ribosomal protein L11': ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or to be found by unique key Again, the annotation is >2000 characters long but well under the 4000 character limit. 3.) As others have already reported, the memory usage can be high, the above load_ontology process takes about 2.5GB of memory. I guess the fact that the problems 1 and 2 already arise with strings that are <4000 chars long might be related to the local character coding. The code: my $hash_ref = $dbh->ora_nls_parameters(); my $database_charset = $hash_ref->{NLS_CHARACTERSET}; my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET}; print "database charset: $database_charset\n"; print "national charset: $national_charset\n"; gives database charset: WE8ISO8859P1 national charset: AL16UTF16 and $ locale LC_CTYPE | head upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3 toupper;tolower;totitle 16 6 UTF-8 70 84 1 0 1 Some details of the system: Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit) Oracle 10g, version 10.1.0.3.0 - 64bit Perl, v5.8.0 built for x86_64-linux-thread-multi Bioperl 1.4 bioperl-db 0.1 DBD::Oracle 1.16 Biosql-schema downloaded on May 10 What would be the best way to solve these problems? Best regards, Teemu Kivioja ------------------------------------------------------------------ Teemu Kivioja, Research Scientist VTT Biotechnology P.O. Box 1500, FIN-02044 VTT, Finland (Street address: Tietotie 2, Espoo, Otaniemi) Email: Teemu.Kivioja@vtt.fi Phone: +358 20 722 7111 Fax: +358 20 722 7071 From hollandr at gis.a-star.edu.sg Tue Jun 28 11:33:27 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Tue Jun 28 11:25:48 2005 Subject: [BioSQL-l] Loading long strings to Oracle Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601E87172@BIONIC.biopolis.one-north.com> We had similar trouble in BioJava when Oracle 9 and 10 suddenly stopped supporting the use of setString and getString on CLOB columns. Special code was required to force BioJava to detect the database and use the special Oracle CLOB-specific accession methods, just like your 'quick fix' of setting ora_type does below. Hilmar is the best guy to talk to here as he uses BioPerl with BioSQL and Oracle in his production db at work. Your annotation fails because you are using UTF16 as the character set in the database. This means that each character is stored as 16 bits or 2 bytes. As the limit in Oracle is 4000 bytes (note bytes not characters) this means that you can only store strings up to 2000 chars long with this encoding. cheers, Richard -----Original Message----- From: biosql-l-bounces@portal.open-bio.org on behalf of Teemu Kivioja Sent: Tue 6/28/2005 11:19 PM To: biosql-l@open-bio.org Cc: Subject: [BioSQL-l] Loading long strings to Oracle Hi, I have couple of possibly related problems with loading to the Oracle database. 1.) When trying to load yeast proteins from SGD, I get: perl load_seqdatabase.pl --host sboracle1.ad.vtt.fi --driver Oracle --testonly --dbname BfxDB --format swiss --printerror test.swiss Loading test.swiss ... DBD::Oracle::st execute failed: ORA-01461: can bind a LONG value only for insert into a LONG column (DBD ERROR: OCIStmtExecute) [for statement ``UPDATE biosequence SET version = NVL(?,version), length = NVL(?,length), alphabet = NVL(?,alphabet), crc = NVL(?,crc), seq = NVL(?,seq), ent_oid = NVL(?,ent_oid) WHERE ent_oid = ?'' with params: :p5='MAKQRQTTKSSKRYRYSSFKARIDDLKIEPARNLEKRVHDYVESSHFLASFDQWKEINLSAKFTEFAAEIEHDVQTLPQILYHDKKIFNSLVSFINFHDEFSLQPLLDLLAQFCHDLGPDFLKFYEEAIKTLINLLDAAIEFESSNVFEWGFNCLAYIFKYLSKFLVKKLVLTCDLLIPLLSHSKEYLSRFSAEALSFLVRKCPVSNLREFVRSVFEKLEGDDEQTNLYEGLLILFTESMTSTQETLHSKAKAIMSVLLHEALTKSSPERSVSLLSDIWMNISKYASIESLLPVYEVMYQDFNDSLDATNIDRILKVLTTIVFSESGRKIPDWNKITILIERIMSQSENCASLSQDKVAFLFALFIRNSDVKTLTLFHQKLFNYALTNISDCFLE...', :p3='protein', :p6='14404', :p1=undef, :p7='14404', :p4='F6ED4E3E9AE0F468', :p2=2493]) at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BaseDriver.pm line 1115, line 51. The file test.swiss only includes the record: ID YBL004W STANDARD; PRT; 2494 AA. It seems that I can get rid of this error message by explicitly telling that the type of sequence field is CLOB by adding the code if ($slots[$i] eq "seq") { $self->bind_param($sth, $j, $slotvals->[$i], { ora_type => ORA_CLOB }); } else { $self->bind_param($sth, $j, $slotvals->[$i]); } 2.) When trying insert InterPro (interpro 10.0 ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get: perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi --namespace interpro --driver Oracle --dbname BfxDB --testonly --fmtargs "ontology_engine,simple" interpro.xml ... 11900 Loading ontology InterPro: ... terms -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were ("IPR000911","Ribosomal protein L11","Ribosomes are ... ORA-01461: can bind a LONG value only for insert into a LONG column (DBD ERROR: error possibly near <*> indicator at char 14 in 'INSERT INTO te<*>rm (identifier, name, definition, is_obsolete, ont_oid) VALUES (:p1, :p2, :p3, :p4, :p5)') --------------------------------------------------- Could not store term IPR000911, name 'Ribosomal protein L11': ------------- EXCEPTION ------------- MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or to be found by unique key Again, the annotation is >2000 characters long but well under the 4000 character limit. 3.) As others have already reported, the memory usage can be high, the above load_ontology process takes about 2.5GB of memory. I guess the fact that the problems 1 and 2 already arise with strings that are <4000 chars long might be related to the local character coding. The code: my $hash_ref = $dbh->ora_nls_parameters(); my $database_charset = $hash_ref->{NLS_CHARACTERSET}; my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET}; print "database charset: $database_charset\n"; print "national charset: $national_charset\n"; gives database charset: WE8ISO8859P1 national charset: AL16UTF16 and $ locale LC_CTYPE | head upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3 toupper;tolower;totitle 16 6 UTF-8 70 84 1 0 1 Some details of the system: Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit) Oracle 10g, version 10.1.0.3.0 - 64bit Perl, v5.8.0 built for x86_64-linux-thread-multi Bioperl 1.4 bioperl-db 0.1 DBD::Oracle 1.16 Biosql-schema downloaded on May 10 What would be the best way to solve these problems? Best regards, Teemu Kivioja ------------------------------------------------------------------ Teemu Kivioja, Research Scientist VTT Biotechnology P.O. Box 1500, FIN-02044 VTT, Finland (Street address: Tietotie 2, Espoo, Otaniemi) Email: Teemu.Kivioja@vtt.fi Phone: +358 20 722 7111 Fax: +358 20 722 7071 _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From hlapp at gnf.org Tue Jun 28 11:42:22 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 28 11:33:49 2005 Subject: [BioSQL-l] Loading long strings to Oracle In-Reply-To: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi> References: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi> Message-ID: <31e6ba5c4a09865c67ae7a0d65f68252@gnf.org> You're probably not using bioperl-db 0.1, judging from the generated query. Make sure you use a recent download from CVS. There is a test in bioperl-db for inserting and retrieving long sequences. Have you run the tests and seen a problem? This may indeed be due to some problem with the character encoding. The Oracle-specific layer of the adaptors deal a bit differently with sequences longer than 4000 chars. However, if in your case they are encoded in Unicode, then maybe the threshold would be half that size? Can you check what happens when you truncate the sequence and the other troubling string to less than 2000 chars? Also, to nail down the problem, you could also try to have database and OS run under the same locale/encoding. -hilmar On Jun 28, 2005, at 11:19 AM, Teemu Kivioja wrote: > Hi, > > I have couple of possibly related problems with loading to the Oracle > database. > > 1.) When trying to load yeast proteins from SGD, I get: > > perl load_seqdatabase.pl --host sboracle1.ad.vtt.fi --driver Oracle > --testonly --dbname BfxDB --format swiss --printerror test.swiss > Loading test.swiss ... > DBD::Oracle::st execute failed: ORA-01461: can bind a LONG value only > for insert into a LONG column (DBD ERROR: OCIStmtExecute) [for > statement ``UPDATE biosequence SET version = NVL(?,version), length = > NVL(?,length), alphabet = NVL(?,alphabet), crc = NVL(?,crc), seq = > NVL(?,seq), ent_oid = NVL(?,ent_oid) WHERE ent_oid = ?'' with params: > : > p5='MAKQRQTTKSSKRYRYSSFKARIDDLKIEPARNLEKRVHDYVESSHFLASFDQWKEINLSAKFTEFA > AEIEHDVQTLPQILYHDKKIFNSLVSFINFHDEFSLQPLLDLLAQFCHDLGPDFLKFYEEAIKTLINLLDA > AIEFESSNVFEWGFNCLAYIFKYLSKFLVKKLVLTCDLLIPLLSHSKEYLSRFSAEALSFLVRKCPVSNLR > EFVRSVFEKLEGDDEQTNLYEGLLILFTESMTSTQETLHSKAKAIMSVLLHEALTKSSPERSVSLLSDIWM > NISKYASIESLLPVYEVMYQDFNDSLDATNIDRILKVLTTIVFSESGRKIPDWNKITILIERIMSQSENCA > SLSQDKVAFLFALFIRNSDVKTLTLFHQKLFNYALTNISDCFLE...', :p3='protein', > :p6='14404', :p1=undef, :p7='14404', :p4='F6ED4E3E9AE0F468', > :p2=2493]) at > /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BaseDriver.pm line 1115, > line 51. > > The file test.swiss only includes the record: > ID YBL004W STANDARD; PRT; 2494 AA. > > It seems that I can get rid of this error message by explicitly > telling that the type of sequence field is CLOB by adding the code > if ($slots[$i] eq "seq") { > $self->bind_param($sth, $j, $slotvals->[$i], > { ora_type => ORA_CLOB }); > } else { > $self->bind_param($sth, $j, $slotvals->[$i]); > } > > > 2.) When trying insert InterPro (interpro 10.0 > ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get: > > perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi > --namespace interpro --driver Oracle --dbname BfxDB --testonly > --fmtargs "ontology_engine,simple" interpro.xml > ... > 11900 > Loading ontology InterPro: > ... terms > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were ("IPR000911","Ribosomal protein L11","Ribosomes are ... > > ORA-01461: can bind a LONG value only for insert into a LONG column > (DBD ERROR: error possibly near <*> indicator at char 14 in 'INSERT > INTO te<*>rm (identifier, name, definition, is_obsolete, ont_oid) > VALUES (:p1, :p2, :p3, :p4, :p5)') > --------------------------------------------------- > Could not store term IPR000911, name 'Ribosomal protein L11': > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or > to be found by unique key > > Again, the annotation is >2000 characters long but well under the 4000 > character limit. > > 3.) As others have already reported, the memory usage can be high, > the above load_ontology process takes about 2.5GB of memory. > > I guess the fact that the problems 1 and 2 already arise with strings > that are <4000 chars long might be related to the local character > coding. The code: > > my $hash_ref = $dbh->ora_nls_parameters(); > my $database_charset = $hash_ref->{NLS_CHARACTERSET}; > my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET}; > print "database charset: $database_charset\n"; > print "national charset: $national_charset\n"; > > gives > > database charset: WE8ISO8859P1 > national charset: AL16UTF16 > > and > > $ locale LC_CTYPE | head > upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct; > alnum;combining;combining_level3 > toupper;tolower;totitle > 16 > 6 > UTF-8 > 70 > 84 > 1 > 0 > 1 > > Some details of the system: > Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit) > Oracle 10g, version 10.1.0.3.0 - 64bit > Perl, v5.8.0 built for x86_64-linux-thread-multi > Bioperl 1.4 > bioperl-db 0.1 > DBD::Oracle 1.16 > Biosql-schema downloaded on May 10 > > What would be the best way to solve these problems? > > Best regards, > Teemu Kivioja > > > > > ------------------------------------------------------------------ > Teemu Kivioja, Research Scientist > VTT Biotechnology > P.O. Box 1500, FIN-02044 VTT, Finland > (Street address: Tietotie 2, Espoo, Otaniemi) > Email: Teemu.Kivioja@vtt.fi > Phone: +358 20 722 7111 > Fax: +358 20 722 7071 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Jun 28 12:15:41 2005 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 28 12:07:34 2005 Subject: [BioSQL-l] Loading long strings to Oracle In-Reply-To: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi> References: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi> Message-ID: <47b9e0958024fd2f4ccaeecb159a4b70@gnf.org> On Jun 28, 2005, at 11:19 AM, Teemu Kivioja wrote: > Hi, > > > It seems that I can get rid of this error message by explicitly > telling that the type of sequence field is CLOB by adding the code > if ($slots[$i] eq "seq") { > $self->bind_param($sth, $j, $slotvals->[$i], > { ora_type => ORA_CLOB }); > } else { > $self->bind_param($sth, $j, $slotvals->[$i]); > } I've used that type of code before, but it wasn't necessary any more for INSERTs and didn't solve the problem for UPDATEs. This may be related to the character encoding problem, in particular the fact that with UTF16 the byte length is no longer equal to the string length. > 2.) When trying insert InterPro (interpro 10.0 > ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get: > > perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi > --namespace interpro --driver Oracle --dbname BfxDB --testonly > --fmtargs "ontology_engine,simple" interpro.xml Don't worry about the ontology engine. Use format interprosax, which is an alias to an event-based parser, that should keep the memory usage down. > ... > 11900 > Loading ontology InterPro: > ... terms > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values > were ("IPR000911","Ribosomal protein L11","Ribosomes are ... > > ORA-01461: can bind a LONG value only for insert into a LONG column > (DBD ERROR: error possibly near <*> indicator at char 14 in 'INSERT > INTO te<*>rm (identifier, name, definition, is_obsolete, ont_oid) > VALUES (:p1, :p2, :p3, :p4, :p5)') > --------------------------------------------------- > Could not store term IPR000911, name 'Ribosomal protein L11': > > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or > to be found by unique key > > Again, the annotation is >2000 characters long but well under the 4000 > character limit. This may be due to the encoding problem as well as you suspect yourself. -hilmar > > 3.) As others have already reported, the memory usage can be high, > the above load_ontology process takes about 2.5GB of memory. > > I guess the fact that the problems 1 and 2 already arise with strings > that are <4000 chars long might be related to the local character > coding. The code: > > my $hash_ref = $dbh->ora_nls_parameters(); > my $database_charset = $hash_ref->{NLS_CHARACTERSET}; > my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET}; > print "database charset: $database_charset\n"; > print "national charset: $national_charset\n"; > > gives > > database charset: WE8ISO8859P1 > national charset: AL16UTF16 > > and > > $ locale LC_CTYPE | head > upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct; > alnum;combining;combining_level3 > toupper;tolower;totitle > 16 > 6 > UTF-8 > 70 > 84 > 1 > 0 > 1 > > Some details of the system: > Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit) > Oracle 10g, version 10.1.0.3.0 - 64bit > Perl, v5.8.0 built for x86_64-linux-thread-multi > Bioperl 1.4 > bioperl-db 0.1 > DBD::Oracle 1.16 > Biosql-schema downloaded on May 10 > > What would be the best way to solve these problems? > > Best regards, > Teemu Kivioja > > > > > ------------------------------------------------------------------ > Teemu Kivioja, Research Scientist > VTT Biotechnology > P.O. Box 1500, FIN-02044 VTT, Finland > (Street address: Tietotie 2, Espoo, Otaniemi) > Email: Teemu.Kivioja@vtt.fi > Phone: +358 20 722 7111 > Fax: +358 20 722 7071 > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 -------------------------------------------------------------