From hlapp at gnf.org  Wed Jun  1 01:48:10 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Wed Jun  1 01:42:30 2005
Subject: [BioSQL-l] RE: [Biojava-l] Change Proposal regarding References
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601B94799@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601B94799@BIONIC.biopolis.one-north.com>
Message-ID: <ad424cc855733ae0a910b9d63d0abd3f@gnf.org>


On May 31, 2005, at 8:42 PM, Richard HOLLAND wrote:

> I should also point out that we should be using the 
> 'bioentry_reference'
> and 'reference' tables, and not 'bioentry_dbxref' as I mistakenly
> mentioned in the original post.
>

Right - so you've corrected this already.

Note that reference has a foreign key to dbxref to store the PUBMED or 
MEDLINE id. The foreign key is identifying; i.e., there's also a unique 
key constraint on that foreign key, meaning only one reference can 
point to a particular PUBMED id.

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From boehme at mpiib-berlin.mpg.de  Thu Jun  2 08:17:42 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Thu Jun  2 08:16:17 2005
Subject: [BioSQL-l] How to add a feature?
Message-ID: <429EF8E6.6030309@mpiib-berlin.mpg.de>

I'm wondering how to add a feature to a given sequence?
I know, I can use createFeature, but that changes nothing in the 
database, that does addSequence. So is the proper way to retrieve the 
seq., get all its features, copy it to new seq and add a feature, 
delete the seq in the database and store the new one?
There must be a simpler way? BioJava In Anger is rather sparse on 
things like that, I could do with a lot more examples ..

Martina
From Marc.Logghe at devgen.com  Thu Jun  2 08:42:56 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Thu Jun  2 08:35:26 2005
Subject: [BioSQL-l] How to add a feature?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com>

Hi Martina,
I don't know how it goes in BioJava but in BioPerl the flow looks like
this:
1) create your feature
2) make it persistent
3) add it to your (persistent) sequence object
4) store the sequence object in the databse
5) commit if necessary

HTH,
Marc

> I'm wondering how to add a feature to a given sequence?
> I know, I can use createFeature, but that changes nothing in 
> the database, that does addSequence. So is the proper way to 
> retrieve the seq., get all its features, copy it to new seq 
> and add a feature, delete the seq in the database and store 
> the new one?
> There must be a simpler way? BioJava In Anger is rather 
> sparse on things like that, I could do with a lot more examples ..
> 
> Martina
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

From boehme at mpiib-berlin.mpg.de  Thu Jun  2 09:03:30 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Thu Jun  2 08:55:32 2005
Subject: [BioSQL-l] How to add a feature?
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com>
Message-ID: <429F03A2.1090208@mpiib-berlin.mpg.de>

Thanks Marc,
but I don't know how to make a feature persistent in Biojava. Maybe 
someone from the bioJava list can help me?

Martina

Marc Logghe wrote:

> Hi Martina,
> I don't know how it goes in BioJava but in BioPerl the flow looks like
> this:
> 1) create your feature
> 2) make it persistent
> 3) add it to your (persistent) sequence object
> 4) store the sequence object in the databse
> 5) commit if necessary
> 
> HTH,
> Marc
> 
> 
>>I'm wondering how to add a feature to a given sequence?
>>I know, I can use createFeature, but that changes nothing in 
>>the database, that does addSequence. So is the proper way to 
>>retrieve the seq., get all its features, copy it to new seq 
>>and add a feature, delete the seq in the database and store 
>>the new one?
>>There must be a simpler way? BioJava In Anger is rather 
>>sparse on things like that, I could do with a lot more examples ..
>>
>>Martina
>>_______________________________________________
>>BioSQL-l mailing list
>>BioSQL-l@open-bio.org
>>http://open-bio.org/mailman/listinfo/biosql-l
> 
> 
From simon.foote at nrc-cnrc.gc.ca  Thu Jun  2 09:34:30 2005
From: simon.foote at nrc-cnrc.gc.ca (Simon Foote)
Date: Thu Jun  2 09:41:18 2005
Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature?
In-Reply-To: <429F03A2.1090208@mpiib-berlin.mpg.de>
References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com>
	<429F03A2.1090208@mpiib-berlin.mpg.de>
Message-ID: <429F0AE6.6020806@nrc-cnrc.gc.ca>

Hi Martina,

To add a feature to a sequence stored in a  BioSQL database, all you 
have to do is retrieve the sequence and then add a feature to it.  The 
following simplified code shows you the steps:

// Retrieve the sequence from BioSQLSequenceDB
Sequence seq = bsd.getSequence(id);
// Create new stranded feature
StrandedFeature.Template templ = new StrandedFeature.Template();
templ.location = ...
templ.strand = ...
templ.type = ...
templ.source = ...
templ.annotation = [A created SimpleAnnotation object]
// Add feature to sequence
seq.createFeature(templ);
// Note: adding the feature like this will automatically persist the 
feature, so you don't have to worry about doing that.

Cheers,
Simon Foote

-- 
Bioinformatics Programmer
Pathogen Genomics
Institute for Biological Sciences
National Research Council of Canada
[T] 613-990-0561  [F] 613-952-9092
simon.foote@nrc-cnrc.gc.ca


Martina wrote:

> Thanks Marc,
> but I don't know how to make a feature persistent in Biojava. Maybe 
> someone from the bioJava list can help me?
>
> Martina
>
> Marc Logghe wrote:
>
>> Hi Martina,
>> I don't know how it goes in BioJava but in BioPerl the flow looks like
>> this:
>> 1) create your feature
>> 2) make it persistent
>> 3) add it to your (persistent) sequence object
>> 4) store the sequence object in the databse
>> 5) commit if necessary
>>
>> HTH,
>> Marc
>>
>>
>>> I'm wondering how to add a feature to a given sequence?
>>> I know, I can use createFeature, but that changes nothing in the 
>>> database, that does addSequence. So is the proper way to retrieve 
>>> the seq., get all its features, copy it to new seq and add a 
>>> feature, delete the seq in the database and store the new one?
>>> There must be a simpler way? BioJava In Anger is rather sparse on 
>>> things like that, I could do with a lot more examples ..
>>>
>>> Martina
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>
>>
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l


From hlapp at gnf.org  Thu Jun  2 12:39:55 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu Jun  2 12:34:25 2005
Subject: [BioSQL-l] How to add a feature?
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com>
Message-ID: <e938ed3ec17782e7305c66a202fb1104@gnf.org>


On Jun 2, 2005, at 5:42 AM, Marc Logghe wrote:

> Hi Martina,
> I don't know how it goes in BioJava but in BioPerl the flow looks like
> this:
> 1) create your feature
> 2) make it persistent

Just as a note, you don't need to make the feature persistent before 
adding it. Just add it to the persistent sequence object and then call 
$pseq->store().

	-hilmar

> 3) add it to your (persistent) sequence object
> 4) store the sequence object in the databse
> 5) commit if necessary
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From mark.schreiber at novartis.com  Thu Jun  2 21:02:57 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Thu Jun  2 20:55:04 2005
Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature?
Message-ID: <OF4650721E.809C3517-ON48257015.00058B5F-48257015.0005C312@EU.novartis.net>

>There must be a simpler way? BioJava In Anger is rather 
>sparse on things like that, I could do with a lot more examples ..
>

All donations of examples are gratefully received. As you say it could do 
with more examples but hey, I'm only one man, with a day job that is 
rapidly turning into a night job too : )

- Mark


From boehme at mpiib-berlin.mpg.de  Mon Jun  6 05:34:50 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Mon Jun  6 05:26:52 2005
Subject: Bio Java (was: Re: [Biojava-l] Re: [BioSQL-l] How to add a feature?)
In-Reply-To: <OF4650721E.809C3517-ON48257015.00058B5F-48257015.0005C312@EU.novartis.net>
References: <OF4650721E.809C3517-ON48257015.00058B5F-48257015.0005C312@EU.novartis.net>
Message-ID: <42A418BA.8090407@mpiib-berlin.mpg.de>

Sorry - I didn't mean you personally! Because it is quite hard for me
to figure out how things are working just from the api and the
sources, I assumed it would be similar for others starting with
BioJava/BioSQL. There must be some working code around somewhere which
could be donated? Please do :-) It would increase the popularity of
BioJava/BioSQL, which it deserved, I would think.

Martina

mark.schreiber@novartis.com wrote:

>>There must be a simpler way? BioJava In Anger is rather 
>>sparse on things like that, I could do with a lot more examples ..
>>
> 
> 
> All donations of examples are gratefully received. As you say it could do 
> with more examples but hey, I'm only one man, with a day job that is 
> rapidly turning into a night job too : )
> 
> - Mark
> 
> 
From boehme at mpiib-berlin.mpg.de  Mon Jun  6 10:18:54 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Mon Jun  6 10:12:43 2005
Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature?
In-Reply-To: <429F0AE6.6020806@nrc-cnrc.gc.ca>
References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com>
	<429F03A2.1090208@mpiib-berlin.mpg.de>
	<429F0AE6.6020806@nrc-cnrc.gc.ca>
Message-ID: <42A45B4E.5070906@mpiib-berlin.mpg.de>

Thanks - I knew it would be quite simple, as always with BioJava (once 
  I've figuered out how to, that is)!
Martina

Simon Foote wrote:

> Hi Martina,
> 
> To add a feature to a sequence stored in a  BioSQL database, all you 
> have to do is retrieve the sequence and then add a feature to it.  The 
> following simplified code shows you the steps:
> 
> // Retrieve the sequence from BioSQLSequenceDB
> Sequence seq = bsd.getSequence(id);
> // Create new stranded feature
> StrandedFeature.Template templ = new StrandedFeature.Template();
> templ.location = ...
> templ.strand = ...
> templ.type = ...
> templ.source = ...
> templ.annotation = [A created SimpleAnnotation object]
> // Add feature to sequence
> seq.createFeature(templ);
> // Note: adding the feature like this will automatically persist the 
> feature, so you don't have to worry about doing that.
> 
> Cheers,
> Simon Foote
> 
From hlapp at gmx.net  Wed Jun  8 22:20:14 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed Jun  8 22:13:52 2005
Subject: [BioSQL-l] Re: [Bioperl-l] Error loading sequence with
	load_seqdatabase.pl
In-Reply-To: <20050608114341.29861.qmail@web40728.mail.yahoo.com>
References: <20050608114341.29861.qmail@web40728.mail.yahoo.com>
Message-ID: <9994082cb32d76711db846757e47ad22@gmx.net>

What OS are you running this on? How much memory have you got on the  
machine on which you run the script, and on the machine on which you  
run the database? Are these the same or not? Which version of DBI and  
DBD::Pg?

This hasn't been reported by anyone else really so I suspect it's  
either due to too limited memory, or a problem in the DBD driver or in  
the DBI compiled code. Can you watch the process (using, e.g., top) and  
see how fast it increases in memory consumption? Since you can continue  
when you restart it's not something specific to one sequence that would  
trigger the problem; rather it appears whenever you have run through a  
certain number of entries the process dies.

	-hilmar

On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri wrote:

> Hi,
>
> I've used the bioperl script load_seqdatabase.pl (came
> with the biosql' scripts) to load the bacterial
> sequence in genbank format(*.gbk) into PostgreSQL 8.0
> database on Linux machine as:
>
> $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk &
>
> Where  under the /export/Bacteria/ path are the
> Bacteria's name path e.g. Acinetobacter_sp_ADP1 and
> the file name are like NC_006824.gbk.
>
> Previously it used to load some sequences in to some
> tables in biosql database (count from table bioentry)
>
> bioseq=# select count(*) from bioentry;
>  count
> -------
>    33
> (1 row)
>
>
> However, after a while it then stopped with the the
> error:
>
> [1]+  Segmentation fault      perl load_seqdatabase.pl
> /export/Bacteria/*/*.gbk &
>
> I then checked and removed the *.gbk file that have
> already been loaded in to the table, leaving only the
> unloaded ones and ran the scripted again.  It
> continued to work for some times and stopped again.  I
> repeated the process several times until 173 sequences
> were loaded into the table:
>
> bioseq=# select count(*) from bioentry;
>  count
> -------
>    173
> (1 row)
>
> The program then stopped again and this time it
> wouldn't run anymore even I tried with only on file.
> The error is still the same like:
>
> $ perl load_seqdatabase.pl
> /export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk
> Segmentation fault
> $
>
> Now I couldn't load the rest of my sequences into the
> database anymore.  I would be very apprecialed if any
> one knows how to solve the "Segmentation fault"
> problem?
>
> Regards,
>
> Davina
>
>
> 		
> __________________________________
> Discover Yahoo!
> Have fun online with music videos, cool games, IM and more. Check it  
> out!
> http://discover.yahoo.com/ 
> online.html<load_seqdatabase.pl>_______________________________________ 
> ________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From dbastar at yahoo.com  Wed Jun  8 23:39:24 2005
From: dbastar at yahoo.com (Duangdaow Kanhasiri)
Date: Wed Jun  8 23:34:30 2005
Subject: [BioSQL-l] Re: [Bioperl-l] Error loading sequence with
	load_seqdatabase.pl
In-Reply-To: <9994082cb32d76711db846757e47ad22@gmx.net>
Message-ID: <20050609033924.11682.qmail@web40708.mail.yahoo.com>

The 

OS: Rocks Cluster v 3.3 
Total Memory: 2 GB
DBD::Pg version: 1.42
DBI version: 1.48


--- Hilmar Lapp <hlapp@gmx.net> wrote:

> What OS are you running this on? How much memory
> have you got on the  
> machine on which you run the script, and on the
> machine on which you  
> run the database? Are these the same or not? Which
> version of DBI and  
> DBD::Pg?
> 
> This hasn't been reported by anyone else really so I
> suspect it's  
> either due to too limited memory, or a problem in
> the DBD driver or in  
> the DBI compiled code. Can you watch the process
> (using, e.g., top) and  
> see how fast it increases in memory consumption?
> Since you can continue  
> when you restart it's not something specific to one
> sequence that would  
> trigger the problem; rather it appears whenever you
> have run through a  
> certain number of entries the process dies.
> 
> 	-hilmar
> 
> On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri
> wrote:
> 
> > Hi,
> >
> > I've used the bioperl script load_seqdatabase.pl
> (came
> > with the biosql' scripts) to load the bacterial
> > sequence in genbank format(*.gbk) into PostgreSQL
> 8.0
> > database on Linux machine as:
> >
> > $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk
> &
> >
> > Where  under the /export/Bacteria/ path are the
> > Bacteria's name path e.g. Acinetobacter_sp_ADP1
> and
> > the file name are like NC_006824.gbk.
> >
> > Previously it used to load some sequences in to
> some
> > tables in biosql database (count from table
> bioentry)
> >
> > bioseq=# select count(*) from bioentry;
> >  count
> > -------
> >    33
> > (1 row)
> >
> >
> > However, after a while it then stopped with the
> the
> > error:
> >
> > [1]+  Segmentation fault      perl
> load_seqdatabase.pl
> > /export/Bacteria/*/*.gbk &
> >
> > I then checked and removed the *.gbk file that
> have
> > already been loaded in to the table, leaving only
> the
> > unloaded ones and ran the scripted again.  It
> > continued to work for some times and stopped
> again.  I
> > repeated the process several times until 173
> sequences
> > were loaded into the table:
> >
> > bioseq=# select count(*) from bioentry;
> >  count
> > -------
> >    173
> > (1 row)
> >
> > The program then stopped again and this time it
> > wouldn't run anymore even I tried with only on
> file.
> > The error is still the same like:
> >
> > $ perl load_seqdatabase.pl
> >
>
/export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk
> > Segmentation fault
> > $
> >
> > Now I couldn't load the rest of my sequences into
> the
> > database anymore.  I would be very apprecialed if
> any
> > one knows how to solve the "Segmentation fault"
> > problem?
> >
> > Regards,
> >
> > Davina
> >
> >
> > 		
> > __________________________________
> > Discover Yahoo!
> > Have fun online with music videos, cool games, IM
> and more. Check it  
> > out!
> > http://discover.yahoo.com/ 
> >
>
online.html<load_seqdatabase.pl>_______________________________________
> 
> > ________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> >
>
http://portal.open-bio.org/mailman/listinfo/bioperl-l
> -- 
>
-------------------------------------------------------------
> Hilmar Lapp                            email: lapp
> at gnf.org
> GNF, San Diego, Ca. 92121              phone:
> +1-858-812-1757
>
-------------------------------------------------------------
> 
> 
> 


__________________________________ 
Discover Yahoo! 
Get on-the-go sports scores, stock quotes, news and more. Check it out! 
http://discover.yahoo.com/mobile.html
From jana.bauckmann at informatik.hu-berlin.de  Tue Jun 14 05:52:29 2005
From: jana.bauckmann at informatik.hu-berlin.de (Jana Bauckmann)
Date: Tue Jun 14 05:44:16 2005
Subject: [BioSQL-l] memory error while loading SwissProt into Oracle using
	bioperl-db
Message-ID: <Pine.GSO.4.33.0506141050110.4029-100000@amsel>

Hi,

I would like to load SwissProt data into my Oracle 9.2 database with
BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got two
problems:

1) I get many (about 1300) warnings stating integrity constraint errors:

ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - parent
key not found (DBD ERROR: OCIStmtExecute)

ORA-01400: cannot insert NULL into ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS")
(DBD ERROR: OCIStmtExecute)

2) The script stops after 2 hours (34500 tuples in table BioEntry) with
message: Out of memory!

I guess problem 1 causes problem 2. Is this reasonable or do I have two
separated problems?

I run Oracle and the load script on the same machine with:
Suse Linux 9.0 (kernel 2.4.21-291-smp) with  12 GB RAM
perl 5.8.1, built for i586-linux-thread-multi
bioperl 1.4
bioperl-db 0.1
DBI 1.48
DBD::Oracle 1.16
Oracle 9.2
BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on 6th
June 2005)

Thanks for any suggestions,
Jana

From hollandr at gis.a-star.edu.sg  Tue Jun 14 06:01:40 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Tue Jun 14 05:54:34 2005
Subject: [BioSQL-l] memory error while loading SwissProt into Oracle
	usingbioperl-db
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCA91F@BIONIC.biopolis.one-north.com>

These are two separate problems. 

(1) is caused by bad data in your SwissProt file - some of the records
in the file refer to journal articles but have not stated any authors.
The associated reference objects then do not get created, and neither do
their dbxrefs, causing integrity constraint errors elsewhere.

(2) means what it says, it's run out of memory! Your script appears to
be creating objects, persisting them to the database, but then keeping
them in memory afterwards either in the BioPerl-db cache or by keeping
its own references somewhere? (I'm not sure of the exact workings of
BioPerl-db here, Hilmar could you enlighten us?). How much memory is
your Oracle instance and other software using on that server? How much
is left for BioPerl?

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> Jana Bauckmann
> Sent: Tuesday, June 14, 2005 5:52 PM
> To: biosql-l@open-bio.org
> Subject: [BioSQL-l] memory error while loading SwissProt into 
> Oracle usingbioperl-db
> 
> 
> Hi,
> 
> I would like to load SwissProt data into my Oracle 9.2 database with
> BioSQL as schema using load_seqdatabase.pl from bioperl-db. 
> I've got two
> problems:
> 
> 1) I get many (about 1300) warnings stating integrity 
> constraint errors:
> 
> ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) 
> violated - parent
> key not found (DBD ERROR: OCIStmtExecute)
> 
> ORA-01400: cannot insert NULL into 
> ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS")
> (DBD ERROR: OCIStmtExecute)
> 
> 2) The script stops after 2 hours (34500 tuples in table 
> BioEntry) with
> message: Out of memory!
> 
> I guess problem 1 causes problem 2. Is this reasonable or do 
> I have two
> separated problems?
> 
> I run Oracle and the load script on the same machine with:
> Suse Linux 9.0 (kernel 2.4.21-291-smp) with  12 GB RAM
> perl 5.8.1, built for i586-linux-thread-multi
> bioperl 1.4
> bioperl-db 0.1
> DBI 1.48
> DBD::Oracle 1.16
> Oracle 9.2
> BioSQL schema for Oracle (downloaded from 
> http://cvs.open-bio.org/ on 6th
> June 2005)
> 
> Thanks for any suggestions,
> Jana
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

From dbastar at yahoo.com  Wed Jun  8 07:43:41 2005
From: dbastar at yahoo.com (Duangdaow Kanhasiri)
Date: Tue Jun 14 22:17:55 2005
Subject: [BioSQL-l] Error loading sequence with load_seqdatabase.pl
Message-ID: <20050608114341.29861.qmail@web40728.mail.yahoo.com>

Hi,

I've used the bioperl script load_seqdatabase.pl (came
with the biosql' scripts) to load the bacterial
sequence in genbank format(*.gbk) into PostgreSQL 8.0
database on Linux machine as:

$perl load_seqdatabase.pl /export/Bacteria/*/*.gbk &

Where  under the /export/Bacteria/ path are the
Bacteria's name path e.g. Acinetobacter_sp_ADP1 and
the file name are like NC_006824.gbk.  

Previously it used to load some sequences in to some
tables in biosql database (count from table bioentry) 

bioseq=# select count(*) from bioentry;
 count
-------
   33
(1 row)


However, after a while it then stopped with the the
error:

[1]+  Segmentation fault      perl load_seqdatabase.pl
/export/Bacteria/*/*.gbk &

I then checked and removed the *.gbk file that have
already been loaded in to the table, leaving only the
unloaded ones and ran the scripted again.  It
continued to work for some times and stopped again.  I
repeated the process several times until 173 sequences
were loaded into the table:

bioseq=# select count(*) from bioentry;
 count
-------
   173
(1 row)

The program then stopped again and this time it
wouldn't run anymore even I tried with only on file.
The error is still the same like: 

$ perl load_seqdatabase.pl
/export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk
Segmentation fault
$

Now I couldn't load the rest of my sequences into the
database anymore.  I would be very apprecialed if any
one knows how to solve the "Segmentation fault"
problem?

Regards,

Davina


__________________________________ 
Discover Yahoo! 
Have fun online with music videos, cool games, IM and more. Check it out! 
http://discover.yahoo.com/online.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: load_seqdatabase.pl
Type: application/octet-stream
Size: 22486 bytes
Desc: 3434098052-load_seqdatabase.pl
Url : http://open-bio.org/pipermail/biosql-l/attachments/20050608/1c6b46ab/load_seqdatabase-0001.obj
From dbastar at yahoo.com  Wed Jun  8 23:55:57 2005
From: dbastar at yahoo.com (Duangdaow Kanhasiri)
Date: Tue Jun 14 22:17:55 2005
Subject: [BioSQL-l] Re: [Bioperl-l] Error loading sequence with
	load_seqdatabase.pl
In-Reply-To: <9994082cb32d76711db846757e47ad22@gmx.net>
Message-ID: <20050609035557.45275.qmail@web40727.mail.yahoo.com>

The system I use hase following configs: 

CPU:               2 @ AthlonXP2000
OS:                Rocks Cluster v 3.3 
Total Memory:      2 GB
DBD::Pg version:   1.42
DBI version:       1.48

I've attached the out put of the top command (top.txt)
with this mail.  Unfortunately that the script
load_seqdatabase.pl wouldn't run anymore, no matter
how many time I tried running it, therefore, I
couldn't measure how much it consumes the resource
(cpu, memory) on the machine.

Regards,

Davina


--- Hilmar Lapp <hlapp@gmx.net> wrote:

> What OS are you running this on? How much memory
> have you got on the  
> machine on which you run the script, and on the
> machine on which you  
> run the database? Are these the same or not? Which
> version of DBI and  
> DBD::Pg?
> 
> This hasn't been reported by anyone else really so I
> suspect it's  
> either due to too limited memory, or a problem in
> the DBD driver or in  
> the DBI compiled code. Can you watch the process
> (using, e.g., top) and  
> see how fast it increases in memory consumption?
> Since you can continue  
> when you restart it's not something specific to one
> sequence that would  
> trigger the problem; rather it appears whenever you
> have run through a  
> certain number of entries the process dies.
> 
> 	-hilmar
> 
> On Jun 8, 2005, at 7:43 PM, Duangdaow Kanhasiri
> wrote:
> 
> > Hi,
> >
> > I've used the bioperl script load_seqdatabase.pl
> (came
> > with the biosql' scripts) to load the bacterial
> > sequence in genbank format(*.gbk) into PostgreSQL
> 8.0
> > database on Linux machine as:
> >
> > $perl load_seqdatabase.pl /export/Bacteria/*/*.gbk
> &
> >
> > Where  under the /export/Bacteria/ path are the
> > Bacteria's name path e.g. Acinetobacter_sp_ADP1
> and
> > the file name are like NC_006824.gbk.
> >
> > Previously it used to load some sequences in to
> some
> > tables in biosql database (count from table
> bioentry)
> >
> > bioseq=# select count(*) from bioentry;
> >  count
> > -------
> >    33
> > (1 row)
> >
> >
> > However, after a while it then stopped with the
> the
> > error:
> >
> > [1]+  Segmentation fault      perl
> load_seqdatabase.pl
> > /export/Bacteria/*/*.gbk &
> >
> > I then checked and removed the *.gbk file that
> have
> > already been loaded in to the table, leaving only
> the
> > unloaded ones and ran the scripted again.  It
> > continued to work for some times and stopped
> again.  I
> > repeated the process several times until 173
> sequences
> > were loaded into the table:
> >
> > bioseq=# select count(*) from bioentry;
> >  count
> > -------
> >    173
> > (1 row)
> >
> > The program then stopped again and this time it
> > wouldn't run anymore even I tried with only on
> file.
> > The error is still the same like:
> >
> > $ perl load_seqdatabase.pl
> >
>
/export/Bacteria/Lactobacillus_johnsonii_NCC_533/NC_005362.gbk
> > Segmentation fault
> > $
> >
> > Now I couldn't load the rest of my sequences into
> the
> > database anymore.  I would be very apprecialed if
> any
> > one knows how to solve the "Segmentation fault"
> > problem?
> >
> > Regards,
> >
> > Davina
> >
> >
> > 		
> > __________________________________
> > Discover Yahoo!
> > Have fun online with music videos, cool games, IM
> and more. Check it  
> > out!
> > http://discover.yahoo.com/ 
> >
>
online.html<load_seqdatabase.pl>_______________________________________
> 
> > ________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> >
>
http://portal.open-bio.org/mailman/listinfo/bioperl-l
> -- 
>
-------------------------------------------------------------
> Hilmar Lapp                            email: lapp
> at gnf.org
> GNF, San Diego, Ca. 92121              phone:
> +1-858-812-1757
>
-------------------------------------------------------------
> 
> 
> 


__________________________________ 
Discover Yahoo! 
Get on-the-go sports scores, stock quotes, news and more. Check it out! 
http://discover.yahoo.com/mobile.html
-------------- next part --------------
[root@biogenome root]# top
10:31:14  up 27 days, 21:20,  5 users,  load average: 0.00, 0.02, 0.03
193 processes: 192 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    1.8%    0.0%    0.0%   0.0%     0.0%    0.0%  198.0%
           cpu00    1.9%    0.0%    0.0%   0.0%     0.0%    0.0%   98.0%
           cpu01    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  100.0%
Mem:  2057220k av, 1556640k used,  500580k free,       0k shrd,  167096k buff
                   1101048k actv,  266692k in_d,   39936k in_c
Swap: 4192956k av,   91620k used, 4101336k free                 1196752k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
16683 root      23   0  1288 1288   844 R     1.9  0.0   0:00   0 top
    1 root      15   0   520  516   456 S     0.0  0.0   0:29   0 init
    2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0 migration/0
    3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1 migration/1
    4 root      15   0     0    0     0 SW    0.0  0.0   0:00   1 keventd
    5 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0 ksoftirqd/0
    6 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1 ksoftirqd/1
    9 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 bdflush
    7 root      15   0     0    0     0 SW    0.0  0.0   0:37   0 kswapd
    8 root      15   0     0    0     0 SW    0.0  0.0   0:24   0 kscand
   10 root      15   0     0    0     0 SW    0.0  0.0   0:19   0 kupdated
   11 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 mdrecoveryd
   17 root      25   0     0    0     0 SW    0.0  0.0   0:00   1 scsi_eh_0
   18 root      25   0     0    0     0 SW    0.0  0.0   0:00   1 aacraid
   20 root      25   0     0    0     0 SW    0.0  0.0   0:00   1 scsi_eh_0
   23 root      15   0     0    0     0 SW    0.0  0.0   1:29   1 kjournald
   70 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 khubd
 1165 root      15   0     0    0     0 SW    0.0  0.0   0:49   0 kjournald
 1418 root      15   0     0    0     0 SW    0.0  0.0   0:00   1 eth0
 1543 root      15   0   620  608   524 S     0.0  0.0   0:59   0 syslogd
 1547 root      15   0   484  424   420 S     0.0  0.0   0:00   0 klogd
 1557 root      15   0   456  448   392 S     0.0  0.0   2:04   0 irqbalance
 1565 rpc       15   0   572  548   500 S     0.0  0.0   0:00   0 portmap
 1584 rpcuser   25   0   716  632   628 S     0.0  0.0   0:00   1 rpc.statd
 1595 root      15   0   404  388   344 S     0.0  0.0   0:06   0 mdadm
 1619 root      RT   0   556  456   424 S     0.0  0.0   0:16   1 auditd
 1629 nobody    15   0  1180 1016   724 S     0.0  0.0  24:57   1 gmetad
 1658 root      15   0   472  424   400 S     0.0  0.0   0:01   0 pvfsd


[root@biogenome DBD]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             5.8G  3.6G  1.9G  66% /
/dev/sda3             125G   24G   95G  21% /export
none                 1005M     0 1005M   0% /dev/shm
tmpfs                 503M  3.5M  499M   1% /var/lib/ganglia/rrds
From mark.schreiber at novartis.com  Mon Jun 20 01:34:11 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Jun 20 01:25:59 2005
Subject: [BioSQL-l] circular
Message-ID: <OF34B47D7A.81112576-ON48257026.001E797C-48257026.001E9921@EU.novartis.net>

Hello -

When circular sequences (plasmids, bacterial genomes etc) are stored in 
BioSQL how is their circularity indicated? Or, what should the convention 
be?

- Mark

Mark Schreiber
Principal Scientist (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From mark.schreiber at novartis.com  Mon Jun 20 02:45:42 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Jun 20 02:37:26 2005
Subject: [BioSQL-l] bioentry-version vs sequence-version
Message-ID: <OFE89FABD3.3C1566BC-ON48257026.0024EF59-48257026.0025253C@EU.novartis.net>

Hello -

Why do bioentry and sequence both have a version column? Sequence records 
only exist in one to one relationships with their parent bioentry so 
surely they would inherit their version number from their parent bioentry?

- Mark

Mark Schreiber
Principal Scientist (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

From shenyang_11 at 163.com  Mon Jun 20 05:11:28 2005
From: shenyang_11 at 163.com (shenyang)
Date: Mon Jun 20 05:11:44 2005
Subject: [BioSQL-l] " Lost connection to MySQL server" when I via biosql by
	using "find_by_unique_key" method
Message-ID: <200506200911.j5K9BHgJ009578@portal.open-bio.org>

Hello-
	I updated my mysql from "mysql-standard-4.0.20-sgi-irix6.5-mips" to
 "mysql-max-4.1.12-sgi-irix6.5-mips".

Then I failed to get richseq object from my sequence database which is biosql schema.
My perl scripte is "

$db = $db||Bio::DB::BioDB->new(-database   => "biosql",
                            -printerror => 0,	
                            -host       => "localhost",
                            -dbname     => $dbname,
                            -driver     => "mysql",
                            -user       => $dbuser,
                            -pass       => $dbpass,
                            );
$seq->namespace($namespace);
$seq->version($version);
my $adp = $db->get_object_adaptor($seq);
my $seqfactor=Bio::Seq::SeqFactory->new(-type=>"Bio::Seq::RichSeq");
$lseq = $adp->find_by_unique_key(
   			$seq,
   			-obj_factory =>$seqfactor,
);

The error message is "

------------- EXCEPTION  -------------

MSG: error while executing statement in Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: Lost connection to MySQL server during query
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/bioperl-db//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:952
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/bioperl-db//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:856
STACK toplevel test_get_seq_embl_acc.pl:9

--------------------------------------

and the mysql log file indicated it's a innodb's problem, here is the mysql logs:

"
050620 16:32:47  mysqld restarted
050620 16:32:49  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
050620 16:32:50  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 2237087117.
InnoDB: Doing recovery: scanned up to log sequence number 0 2237087117
InnoDB: Last MySQL binlog file position 0 79, file name ./biomed-bin.000022
050620 16:32:51  InnoDB: Flushing modified pages from the buffer pool...
050620 16:32:51  InnoDB: Started; log sequence number 0 2237087117
050620 16:32:51 [Warning] mysql.user table is not updated to new password format; Disabling new password usage until mysql_fix_privilege_tables is run
050620 16:32:51 [Warning] Can't open and lock time zone table: Table 'mysql.time_zone_leap_second' doesn't exist trying to live without them
/database/mysql/bin/mysqld: ready for connections.
Version: '4.1.12-max-log'  socket: '/tmp/mysql.sock'  port: 3306  MySQL Community Edition - Experimental (GPL)"


Thanks for any suggestions,
Yang Shen


From Marc.Logghe at devgen.com  Mon Jun 20 05:33:33 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Mon Jun 20 05:26:30 2005
Subject: [BioSQL-l] circular
Message-ID: <0C528E3670D8CE4B8E013F6749231AA606E86A@ANTARESIA.be.devgen.com>

Hi Mark,
As far as I am aware of, there is currently no field available in the
bioentry table to store that kind of flag.
It is parsed out from genbank files by BioPerl, though.
It is taken from the genbank Locus line, eg.
"LOCUS       BBPLAS                  2687 bp    DNA     circular BCT
12-MAR-1999"
You can check the resulting Bio::Seq::RichSeq object by running the
is_circular() method from Bio::PrimarySeq.
A solution would be to make a Bio::Factory::SequenceProcessorI compliant
processor and pass that as an option to your load_seqdatabase.pl script.
In the procesor itself, you can for instance do the following:
1) check for circularity using the is_circular() method
2) if circular, add a term to your sequence object (eg. annotation term,
gene ontology term 'is_circular') indicating it is circular

My 0.02$

Cheers,
Marc


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> mark.schreiber@novartis.com
> Sent: Monday, June 20, 2005 7:34 AM
> To: biosql-l@open-bio.org
> Subject: [BioSQL-l] circular
> 
> Hello -
> 
> When circular sequences (plasmids, bacterial genomes etc) are 
> stored in BioSQL how is their circularity indicated? Or, what 
> should the convention be?
> 
> - Mark
> 
> Mark Schreiber
> Principal Scientist (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

From boehme at mpiib-berlin.mpg.de  Mon Jun 20 05:43:35 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Mon Jun 20 05:39:41 2005
Subject: [BioSQL-l] _removeSequence
Message-ID: <42B68FC7.3060102@mpiib-berlin.mpg.de>

Hi,

Im trying to delete a sequence and recursivly all its features.

So:

for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
	Sequence s = si.nextSequence();
	String name = s.getName();
	s = null;
	db.removeSequence(name);
}

But if I look in the database (MySQL  4.1.12) I can still see plenty 
of entries and I have problems entering the same features again, 
because of dublicate key error. I would like to know if 
_removeSequence(String) in BioSQLSequenceDB is supposed to remove 
features recursivly or just the features of the removed sequence?
If so - what is the best way do delete the features of the features 
(and so on)? And how to empty the db completly?

Martina

From mark.schreiber at novartis.com  Mon Jun 20 05:56:40 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Jun 20 05:48:21 2005
Subject: [BioSQL-l] _removeSequence
Message-ID: <OF99C2BEB9.F7056E19-ON48257026.0036755E-48257026.0036A100@EU.novartis.net>

Biojava doesn't attempt to recusivley remove features by itself. It relies 
on cascading deletes in the database. I know Oracle can be set to do this 
(and it works very well). If MySQL has equivalent functionality you may 
need to turn it on. I'm pretty sure it does but you need to set it up.

- Mark


Martina <boehme@mpiib-berlin.mpg.de>
Sent by: biosql-l-bounces@portal.open-bio.org
06/20/2005 05:43 PM

 
        To:     biosql-l@open-bio.org, BioJava <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [BioSQL-l] _removeSequence


Hi,

Im trying to delete a sequence and recursivly all its features.

So:

for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
                 Sequence s = si.nextSequence();
                 String name = s.getName();
                 s = null;
                 db.removeSequence(name);
}

But if I look in the database (MySQL  4.1.12) I can still see plenty 
of entries and I have problems entering the same features again, 
because of dublicate key error. I would like to know if 
_removeSequence(String) in BioSQLSequenceDB is supposed to remove 
features recursivly or just the features of the removed sequence?
If so - what is the best way do delete the features of the features 
(and so on)? And how to empty the db completly?

Martina

_______________________________________________
BioSQL-l mailing list
BioSQL-l@open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l


From mark.schreiber at novartis.com  Mon Jun 20 06:01:41 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Jun 20 05:53:15 2005
Subject: [BioSQL-l] circular
Message-ID: <OF0E024A83.023853AA-ON48257026.0036B03E-48257026.00371693@EU.novartis.net>

So 'is_circular' should be the blessed term. It really needs to be a 
convention so that reading and writing is consistent between bio* 
projects.

Would it be a good idea for the sequence table of BioSQL 1.1 to have a 
circular column?

- Mark


"Marc Logghe" <Marc.Logghe@devgen.com>
06/20/2005 05:33 PM

 
        To:     Mark Schreiber/GP/Novartis@PH, <biosql-l@open-bio.org>
        cc: 
        Subject:        RE: [BioSQL-l] circular


Hi Mark,
As far as I am aware of, there is currently no field available in the
bioentry table to store that kind of flag.
It is parsed out from genbank files by BioPerl, though.
It is taken from the genbank Locus line, eg.
"LOCUS       BBPLAS                  2687 bp    DNA     circular BCT
12-MAR-1999"
You can check the resulting Bio::Seq::RichSeq object by running the
is_circular() method from Bio::PrimarySeq.
A solution would be to make a Bio::Factory::SequenceProcessorI compliant
processor and pass that as an option to your load_seqdatabase.pl script.
In the procesor itself, you can for instance do the following:
1) check for circularity using the is_circular() method
2) if circular, add a term to your sequence object (eg. annotation term,
gene ontology term 'is_circular') indicating it is circular

My 0.02$

Cheers,
Marc


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> mark.schreiber@novartis.com
> Sent: Monday, June 20, 2005 7:34 AM
> To: biosql-l@open-bio.org
> Subject: [BioSQL-l] circular
> 
> Hello -
> 
> When circular sequences (plasmids, bacterial genomes etc) are 
> stored in BioSQL how is their circularity indicated? Or, what 
> should the convention be?
> 
> - Mark
> 
> Mark Schreiber
> Principal Scientist (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 


From mark.schreiber at novartis.com  Mon Jun 20 06:06:32 2005
From: mark.schreiber at novartis.com (mark.schreiber@novartis.com)
Date: Mon Jun 20 05:58:20 2005
Subject: [BioSQL-l] Re: [Biojava-l] _removeSequence
Message-ID: <OF292A23AB.EA498551-ON48257026.00373EE8-48257026.00378820@EU.novartis.net>

To remove the database completely (while still keeping the tables etc) you 
would again need to turn on cascading deletes and delete the appropriate 
biodatabase row from the biodatabase table (or all of them if you have 
more than one).

You cannot currently do this using the biojava interface. You would need 
to code a JDBC statement to do it for you, or connect to the DB and issue 
the SQL statement yourself.

- Mark


Martina <boehme@mpiib-berlin.mpg.de>
Sent by: biojava-l-bounces@portal.open-bio.org
06/20/2005 05:43 PM

 
        To:     biosql-l@open-bio.org, BioJava <biojava-l@biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] _removeSequence


Hi,

Im trying to delete a sequence and recursivly all its features.

So:

for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
                 Sequence s = si.nextSequence();
                 String name = s.getName();
                 s = null;
                 db.removeSequence(name);
}

But if I look in the database (MySQL  4.1.12) I can still see plenty 
of entries and I have problems entering the same features again, 
because of dublicate key error. I would like to know if 
_removeSequence(String) in BioSQLSequenceDB is supposed to remove 
features recursivly or just the features of the removed sequence?
If so - what is the best way do delete the features of the features 
(and so on)? And how to empty the db completly?

Martina

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l


From hollandr at gis.a-star.edu.sg  Mon Jun 20 06:10:29 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Jun 20 06:03:36 2005
Subject: [BioSQL-l] _removeSequence
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>

To do cascading deletes in MySQL requires the tables to have been set up
using the InnoDB table style (as opposed to the default MyISAM tables).
In InnoDB, foreign keys are actually enforced and deletes will cascade,
whereas in MyISAM it has no concept of foreign keys and so is unable to
enforce data integrity. The people on the BioSQL-L mailing list will be
able to help you there.

The next version of BioJava's database interfaces after the 1.4 release
will assume that the underlying database does have cascading deletes
turned on. The existing version half-attempts to make up for the lack of
cascading deletes in databases that don't support it, but it doesn't do
it well at all, hence the problems you are seeing. After consulting with
Hilmar last week we decided it was a fair assumption to make that all
BioSQL instances are installed with cascading deletes enabled.
BioPerl-db already makes this assumption.

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> mark.schreiber@novartis.com
> Sent: Monday, June 20, 2005 5:57 PM
> To: Martina
> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; 
> biosql-l@open-bio.org
> Subject: Re: [BioSQL-l] _removeSequence
> 
> 
> Biojava doesn't attempt to recusivley remove features by 
> itself. It relies 
> on cascading deletes in the database. I know Oracle can be 
> set to do this 
> (and it works very well). If MySQL has equivalent 
> functionality you may 
> need to turn it on. I'm pretty sure it does but you need to set it up.
> 
> - Mark
> 
> 
> 
> 
> 
> Martina <boehme@mpiib-berlin.mpg.de>
> Sent by: biosql-l-bounces@portal.open-bio.org
> 06/20/2005 05:43 PM
> 
>  
>         To:     biosql-l@open-bio.org, BioJava <biojava-l@biojava.org>
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [BioSQL-l] _removeSequence
> 
> 
> Hi,
> 
> Im trying to delete a sequence and recursivly all its features.
> 
> So:
> 
> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
>                  Sequence s = si.nextSequence();
>                  String name = s.getName();
>                  s = null;
>                  db.removeSequence(name);
> }
> 
> But if I look in the database (MySQL  4.1.12) I can still see plenty 
> of entries and I have problems entering the same features again, 
> because of dublicate key error. I would like to know if 
> _removeSequence(String) in BioSQLSequenceDB is supposed to remove 
> features recursivly or just the features of the removed sequence?
> If so - what is the best way do delete the features of the features 
> (and so on)? And how to empty the db completly?
> 
> Martina
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 
> 
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

From hollandr at gis.a-star.edu.sg  Mon Jun 20 06:11:57 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Jun 20 06:04:53 2005
Subject: [BioSQL-l] Re: [Biojava-l] _removeSequence
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB7A@BIONIC.biopolis.one-north.com>

There is also the BS-zap-all script in the BioSQL distribution which
will wipe the whole lot for you in one go. :)

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biosql-l-bounces@portal.open-bio.org 
> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> mark.schreiber@novartis.com
> Sent: Monday, June 20, 2005 6:07 PM
> To: Martina
> Cc: biojava-l-bounces@portal.open-bio.org; BioJava; 
> biosql-l@open-bio.org
> Subject: [BioSQL-l] Re: [Biojava-l] _removeSequence
> 
> 
> To remove the database completely (while still keeping the 
> tables etc) you 
> would again need to turn on cascading deletes and delete the 
> appropriate 
> biodatabase row from the biodatabase table (or all of them if 
> you have 
> more than one).
> 
> You cannot currently do this using the biojava interface. You 
> would need 
> to code a JDBC statement to do it for you, or connect to the 
> DB and issue 
> the SQL statement yourself.
> 
> - Mark
> 
> 
> 
> 
> 
> Martina <boehme@mpiib-berlin.mpg.de>
> Sent by: biojava-l-bounces@portal.open-bio.org
> 06/20/2005 05:43 PM
> 
>  
>         To:     biosql-l@open-bio.org, BioJava <biojava-l@biojava.org>
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] _removeSequence
> 
> 
> Hi,
> 
> Im trying to delete a sequence and recursivly all its features.
> 
> So:
> 
> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
>                  Sequence s = si.nextSequence();
>                  String name = s.getName();
>                  s = null;
>                  db.removeSequence(name);
> }
> 
> But if I look in the database (MySQL  4.1.12) I can still see plenty 
> of entries and I have problems entering the same features again, 
> because of dublicate key error. I would like to know if 
> _removeSequence(String) in BioSQLSequenceDB is supposed to remove 
> features recursivly or just the features of the removed sequence?
> If so - what is the best way do delete the features of the features 
> (and so on)? And how to empty the db completly?
> 
> Martina
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
> 

From boehme at mpiib-berlin.mpg.de  Mon Jun 20 06:20:37 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Mon Jun 20 06:25:07 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
Message-ID: <42B69875.3050306@mpiib-berlin.mpg.de>

My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 
2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE.
Do I need to do anything else?

Thanks,
Martina

Richard HOLLAND wrote:

> To do cascading deletes in MySQL requires the tables to have been set up
> using the InnoDB table style (as opposed to the default MyISAM tables).
> In InnoDB, foreign keys are actually enforced and deletes will cascade,
> whereas in MyISAM it has no concept of foreign keys and so is unable to
> enforce data integrity. The people on the BioSQL-L mailing list will be
> able to help you there.
> 
> The next version of BioJava's database interfaces after the 1.4 release
> will assume that the underlying database does have cascading deletes
> turned on. The existing version half-attempts to make up for the lack of
> cascading deletes in databases that don't support it, but it doesn't do
> it well at all, hence the problems you are seeing. After consulting with
> Hilmar last week we decided it was a fair assumption to make that all
> BioSQL instances are installed with cascading deletes enabled.
> BioPerl-db already makes this assumption.
> 
> cheers,
> Richard
> 
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
> 
> 
> 
>>-----Original Message-----
>>From: biosql-l-bounces@portal.open-bio.org 
>>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
>>mark.schreiber@novartis.com
>>Sent: Monday, June 20, 2005 5:57 PM
>>To: Martina
>>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; 
>>biosql-l@open-bio.org
>>Subject: Re: [BioSQL-l] _removeSequence
>>
>>
>>Biojava doesn't attempt to recusivley remove features by 
>>itself. It relies 
>>on cascading deletes in the database. I know Oracle can be 
>>set to do this 
>>(and it works very well). If MySQL has equivalent 
>>functionality you may 
>>need to turn it on. I'm pretty sure it does but you need to set it up.
>>
>>- Mark
>>
>>
>>
>>
>>
>>Martina <boehme@mpiib-berlin.mpg.de>
>>Sent by: biosql-l-bounces@portal.open-bio.org
>>06/20/2005 05:43 PM
>>
>> 
>>        To:     biosql-l@open-bio.org, BioJava <biojava-l@biojava.org>
>>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>>        Subject:        [BioSQL-l] _removeSequence
>>
>>
>>Hi,
>>
>>Im trying to delete a sequence and recursivly all its features.
>>
>>So:
>>
>>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
>>                 Sequence s = si.nextSequence();
>>                 String name = s.getName();
>>                 s = null;
>>                 db.removeSequence(name);
>>}
>>
>>But if I look in the database (MySQL  4.1.12) I can still see plenty 
>>of entries and I have problems entering the same features again, 
>>because of dublicate key error. I would like to know if 
>>_removeSequence(String) in BioSQLSequenceDB is supposed to remove 
>>features recursivly or just the features of the removed sequence?
>>If so - what is the best way do delete the features of the features 
>>(and so on)? And how to empty the db completly?
>>
>>Martina
>>
>>_______________________________________________
>>BioSQL-l mailing list
>>BioSQL-l@open-bio.org
>>http://open-bio.org/mailman/listinfo/biosql-l
>>
>>
>>
>>_______________________________________________
>>BioSQL-l mailing list
>>BioSQL-l@open-bio.org
>>http://open-bio.org/mailman/listinfo/biosql-l
> 
> 
From hollandr at gis.a-star.edu.sg  Mon Jun 20 06:33:02 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Mon Jun 20 06:26:20 2005
Subject: [BioSQL-l] _removeSequence
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com>

Well, technically that should work because BioJava simply issues a
delete against the seqfeature table, and therefore all features related
through foreign keys should automatically delete themselves as a result
without any further intervention by BioJava... beats me why it doesn't!
Unfortunately I don't currently use the MySQL implementation myself so I
can't help much. I hope someone on BioSQL-L knows a little more?

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please
do not copy or use it for any purpose, or disclose its content to any
other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: Martina [mailto:boehme@mpiib-berlin.mpg.de] 
> Sent: Monday, June 20, 2005 6:21 PM
> To: Richard HOLLAND
> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; 
> biosql-l@open-bio.org
> Subject: Re: [BioSQL-l] _removeSequence
> 
> 
> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 
> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE.
> Do I need to do anything else?
> 
> Thanks,
> Martina
> 
> Richard HOLLAND wrote:
> 
> > To do cascading deletes in MySQL requires the tables to 
> have been set up
> > using the InnoDB table style (as opposed to the default 
> MyISAM tables).
> > In InnoDB, foreign keys are actually enforced and deletes 
> will cascade,
> > whereas in MyISAM it has no concept of foreign keys and so 
> is unable to
> > enforce data integrity. The people on the BioSQL-L mailing 
> list will be
> > able to help you there.
> > 
> > The next version of BioJava's database interfaces after the 
> 1.4 release
> > will assume that the underlying database does have cascading deletes
> > turned on. The existing version half-attempts to make up 
> for the lack of
> > cascading deletes in databases that don't support it, but 
> it doesn't do
> > it well at all, hence the problems you are seeing. After 
> consulting with
> > Hilmar last week we decided it was a fair assumption to 
> make that all
> > BioSQL instances are installed with cascading deletes enabled.
> > BioPerl-db already makes this assumption.
> > 
> > cheers,
> > Richard
> > 
> > Richard Holland
> > Bioinformatics Specialist
> > GIS extension 8199
> > ---------------------------------------------
> > This email is confidential and may be privileged. If you are not the
> > intended recipient, please delete it and notify us 
> immediately. Please
> > do not copy or use it for any purpose, or disclose its 
> content to any
> > other person. Thank you.
> > ---------------------------------------------
> > 
> > 
> > 
> >>-----Original Message-----
> >>From: biosql-l-bounces@portal.open-bio.org 
> >>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
> >>mark.schreiber@novartis.com
> >>Sent: Monday, June 20, 2005 5:57 PM
> >>To: Martina
> >>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; 
> >>biosql-l@open-bio.org
> >>Subject: Re: [BioSQL-l] _removeSequence
> >>
> >>
> >>Biojava doesn't attempt to recusivley remove features by 
> >>itself. It relies 
> >>on cascading deletes in the database. I know Oracle can be 
> >>set to do this 
> >>(and it works very well). If MySQL has equivalent 
> >>functionality you may 
> >>need to turn it on. I'm pretty sure it does but you need to 
> set it up.
> >>
> >>- Mark
> >>
> >>
> >>
> >>
> >>
> >>Martina <boehme@mpiib-berlin.mpg.de>
> >>Sent by: biosql-l-bounces@portal.open-bio.org
> >>06/20/2005 05:43 PM
> >>
> >> 
> >>        To:     biosql-l@open-bio.org, BioJava 
> <biojava-l@biojava.org>
> >>        cc:     (bcc: Mark Schreiber/GP/Novartis)
> >>        Subject:        [BioSQL-l] _removeSequence
> >>
> >>
> >>Hi,
> >>
> >>Im trying to delete a sequence and recursivly all its features.
> >>
> >>So:
> >>
> >>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
> >>                 Sequence s = si.nextSequence();
> >>                 String name = s.getName();
> >>                 s = null;
> >>                 db.removeSequence(name);
> >>}
> >>
> >>But if I look in the database (MySQL  4.1.12) I can still 
> see plenty 
> >>of entries and I have problems entering the same features again, 
> >>because of dublicate key error. I would like to know if 
> >>_removeSequence(String) in BioSQLSequenceDB is supposed to remove 
> >>features recursivly or just the features of the removed sequence?
> >>If so - what is the best way do delete the features of the features 
> >>(and so on)? And how to empty the db completly?
> >>
> >>Martina
> >>
> >>_______________________________________________
> >>BioSQL-l mailing list
> >>BioSQL-l@open-bio.org
> >>http://open-bio.org/mailman/listinfo/biosql-l
> >>
> >>
> >>
> >>_______________________________________________
> >>BioSQL-l mailing list
> >>BioSQL-l@open-bio.org
> >>http://open-bio.org/mailman/listinfo/biosql-l
> > 
> > 
> 

From boehme at mpiib-berlin.mpg.de  Mon Jun 20 09:11:25 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Mon Jun 20 09:05:29 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com>
Message-ID: <42B6C07D.7000106@mpiib-berlin.mpg.de>

I droped the db and run the bioSql again - looks like its working now!
Must have stopped before the alter table statements - didn't had the 
foreign keys - but I didn't know, that they had to be there.
Thanks!

Richard HOLLAND wrote:

> Well, technically that should work because BioJava simply issues a
> delete against the seqfeature table, and therefore all features related
> through foreign keys should automatically delete themselves as a result
> without any further intervention by BioJava... beats me why it doesn't!
> Unfortunately I don't currently use the MySQL implementation myself so I
> can't help much. I hope someone on BioSQL-L knows a little more?
> 
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
> 
> 
> 
>>-----Original Message-----
>>From: Martina [mailto:boehme@mpiib-berlin.mpg.de] 
>>Sent: Monday, June 20, 2005 6:21 PM
>>To: Richard HOLLAND
>>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; 
>>biosql-l@open-bio.org
>>Subject: Re: [BioSQL-l] _removeSequence
>>
>>
>>My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 
>>2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE.
>>Do I need to do anything else?
>>
>>Thanks,
>>Martina
>>
>>Richard HOLLAND wrote:
>>
>>
>>>To do cascading deletes in MySQL requires the tables to 
>>
>>have been set up
>>
>>>using the InnoDB table style (as opposed to the default 
>>
>>MyISAM tables).
>>
>>>In InnoDB, foreign keys are actually enforced and deletes 
>>
>>will cascade,
>>
>>>whereas in MyISAM it has no concept of foreign keys and so 
>>
>>is unable to
>>
>>>enforce data integrity. The people on the BioSQL-L mailing 
>>
>>list will be
>>
>>>able to help you there.
>>>
>>>The next version of BioJava's database interfaces after the 
>>
>>1.4 release
>>
>>>will assume that the underlying database does have cascading deletes
>>>turned on. The existing version half-attempts to make up 
>>
>>for the lack of
>>
>>>cascading deletes in databases that don't support it, but 
>>
>>it doesn't do
>>
>>>it well at all, hence the problems you are seeing. After 
>>
>>consulting with
>>
>>>Hilmar last week we decided it was a fair assumption to 
>>
>>make that all
>>
>>>BioSQL instances are installed with cascading deletes enabled.
>>>BioPerl-db already makes this assumption.
>>>
>>>cheers,
>>>Richard
>>>
>>>Richard Holland
>>>Bioinformatics Specialist
>>>GIS extension 8199
>>>---------------------------------------------
>>>This email is confidential and may be privileged. If you are not the
>>>intended recipient, please delete it and notify us 
>>
>>immediately. Please
>>
>>>do not copy or use it for any purpose, or disclose its 
>>
>>content to any
>>
>>>other person. Thank you.
>>>---------------------------------------------
>>>
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: biosql-l-bounces@portal.open-bio.org 
>>>>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
>>>>mark.schreiber@novartis.com
>>>>Sent: Monday, June 20, 2005 5:57 PM
>>>>To: Martina
>>>>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; 
>>>>biosql-l@open-bio.org
>>>>Subject: Re: [BioSQL-l] _removeSequence
>>>>
>>>>
>>>>Biojava doesn't attempt to recusivley remove features by 
>>>>itself. It relies 
>>>>on cascading deletes in the database. I know Oracle can be 
>>>>set to do this 
>>>>(and it works very well). If MySQL has equivalent 
>>>>functionality you may 
>>>>need to turn it on. I'm pretty sure it does but you need to 
>>
>>set it up.
>>
>>>>- Mark
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>Martina <boehme@mpiib-berlin.mpg.de>
>>>>Sent by: biosql-l-bounces@portal.open-bio.org
>>>>06/20/2005 05:43 PM
>>>>
>>>>
>>>>       To:     biosql-l@open-bio.org, BioJava 
>>
>><biojava-l@biojava.org>
>>
>>>>       cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>>       Subject:        [BioSQL-l] _removeSequence
>>>>
>>>>
>>>>Hi,
>>>>
>>>>Im trying to delete a sequence and recursivly all its features.
>>>>
>>>>So:
>>>>
>>>>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
>>>>                Sequence s = si.nextSequence();
>>>>                String name = s.getName();
>>>>                s = null;
>>>>                db.removeSequence(name);
>>>>}
>>>>
>>>>But if I look in the database (MySQL  4.1.12) I can still 
>>
>>see plenty 
>>
>>>>of entries and I have problems entering the same features again, 
>>>>because of dublicate key error. I would like to know if 
>>>>_removeSequence(String) in BioSQLSequenceDB is supposed to remove 
>>>>features recursivly or just the features of the removed sequence?
>>>>If so - what is the best way do delete the features of the features 
>>>>(and so on)? And how to empty the db completly?
>>>>
>>>>Martina
>>>>
>>>>_______________________________________________
>>>>BioSQL-l mailing list
>>>>BioSQL-l@open-bio.org
>>>>http://open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>BioSQL-l mailing list
>>>>BioSQL-l@open-bio.org
>>>>http://open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
> 
From boehme at mpiib-berlin.mpg.de  Mon Jun 20 11:20:35 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Mon Jun 20 11:38:47 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
Message-ID: <42B6DEC3.9090807@mpiib-berlin.mpg.de>

Hi,

so I have this new database (still biosqldb-mysql.sqlv 1.40 2004/11/04 
01:49:41) and after removing all sequences, I do still have entries in 
term, term_relationship,term_relationship_term and ontology. And of 
course, in biodatabase. If I delete the entry in biodatabase too, 
nothing changes. Is that what is to be expected?
Cause I still have trouble with the dublicate entry key, but that must 
be my code then.

Thanks
Martina
From hlapp at gnf.org  Mon Jun 20 13:48:04 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jun 20 13:38:09 2005
Subject: [BioSQL-l] " Lost connection to MySQL server" when I via biosql
	by using "find_by_unique_key" method
In-Reply-To: <200506200911.j5K9BHgJ009578@portal.open-bio.org>
References: <200506200911.j5K9BHgJ009578@portal.open-bio.org>
Message-ID: <65541f3e2669ba1ffd9eccaa9dc21988@gnf.org>

Maybe there's a migration script that you need to run and that comes 
with mysql? Have you checked Mysql FAQs and possibly message 
boards/README/HOWTO for what you need to do when upgrading from 4.0.x 
to 4.1.x?

	-hilmar

On Jun 20, 2004, at 1:51 AM, shenyang wrote:

> Hello-
> 	I updated my mysql from "mysql-standard-4.0.20-sgi-irix6.5-mips" to
>  "mysql-max-4.1.12-sgi-irix6.5-mips".
>
> Then I failed to get richseq object from my sequence database which is 
> biosql schema.
> My perl scripte is "
>
> $db = $db||Bio::DB::BioDB->new(-database   => "biosql",
>                             -printerror => 0,	
>                             -host       => "localhost",
>                             -dbname     => $dbname,
>                             -driver     => "mysql",
>                             -user       => $dbuser,
>                             -pass       => $dbpass,
>                             );
> $seq->namespace($namespace);
> $seq->version($version);
> my $adp = $db->get_object_adaptor($seq);
> my $seqfactor=Bio::Seq::SeqFactory->new(-type=>"Bio::Seq::RichSeq");
> $lseq = $adp->find_by_unique_key(
>    			$seq,
>    			-obj_factory =>$seqfactor,
> );
>
> The error message is "
>
> ------------- EXCEPTION  -------------
>
> MSG: error while executing statement in 
> Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: Lost connection to 
> MySQL server during query
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key 
> /usr/lib/bioperl-db//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:952
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key 
> /usr/lib/bioperl-db//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:856
> STACK toplevel test_get_seq_embl_acc.pl:9
>
> --------------------------------------
>
> and the mysql log file indicated it's a innodb's problem, here is the 
> mysql logs:
>
> "
> 050620 16:32:47  mysqld restarted
> 050620 16:32:49  InnoDB: Database was not shut down normally!
> InnoDB: Starting crash recovery.
> InnoDB: Reading tablespace information from the .ibd files...
> InnoDB: Restoring possible half-written data pages from the doublewrite
> InnoDB: buffer...
> 050620 16:32:50  InnoDB: Starting log scan based on checkpoint at
> InnoDB: log sequence number 0 2237087117.
> InnoDB: Doing recovery: scanned up to log sequence number 0 2237087117
> InnoDB: Last MySQL binlog file position 0 79, file name 
> ./biomed-bin.000022
> 050620 16:32:51  InnoDB: Flushing modified pages from the buffer 
> pool...
> 050620 16:32:51  InnoDB: Started; log sequence number 0 2237087117
> 050620 16:32:51 [Warning] mysql.user table is not updated to new 
> password format; Disabling new password usage until 
> mysql_fix_privilege_tables is run
> 050620 16:32:51 [Warning] Can't open and lock time zone table: Table 
> 'mysql.time_zone_leap_second' doesn't exist trying to live without 
> them
> /database/mysql/bin/mysqld: ready for connections.
> Version: '4.1.12-max-log'  socket: '/tmp/mysql.sock'  port: 3306  
> MySQL Community Edition - Experimental (GPL)"
>
>
> Thanks for any suggestions,
> Yang Shen
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Mon Jun 20 15:19:04 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jun 20 15:10:24 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com>
Message-ID: <78e39420822012ffbf691b5edc233b4a@gnf.org>

There's one thing that I'm unsure about in Martina's original email, 
namely whether she was referring to features related to a sequence 
(bioentry), or to features hierarchically related to each other through 
the seqfeature_relationship table.

If the former, then the cascading delete should have taken care of 
removing the features when you remove the sequence (bioentry) to which 
they point through their foreign key (and recursively the locations 
etc).

However, if the question was about hierarchical features, then deleting 
one feature in the hierarchy will never (and shouldn't ever) delete any 
other feature in the hierarchy (except if all of them reference the 
same bioentry and you deleted the bioentry). If you delete a seqfeature 
in a hierarchy of seqfeatures then by cascading delete this will also 
delete all rows in seqfeature_relationship that reference that 
seqfeature as either a subject or an object in a nesting relationship 
between features. I.e., looking at the hierarchy as a graph, removing a 
node will cascade to deleting all incoming and outgoing arcs for that 
node, but not other nodes.

If your application wants to take down all nodes in the hierarchy when 
one node is deleted, you need to write code to do this. (Except if, as 
mentioned before, all features reference the same bioentry, in which 
case deleting the bioentry will delete the entire feature hierarchy.)

	-hilmar

On Jun 20, 2005, at 3:33 AM, Richard HOLLAND wrote:

> Well, technically that should work because BioJava simply issues a
> delete against the seqfeature table, and therefore all features related
> through foreign keys should automatically delete themselves as a result
> without any further intervention by BioJava... beats me why it doesn't!
> Unfortunately I don't currently use the MySQL implementation myself so 
> I
> can't help much. I hope someone on BioSQL-L knows a little more?
>
> Richard Holland
> Bioinformatics Specialist
> GIS extension 8199
> ---------------------------------------------
> This email is confidential and may be privileged. If you are not the
> intended recipient, please delete it and notify us immediately. Please
> do not copy or use it for any purpose, or disclose its content to any
> other person. Thank you.
> ---------------------------------------------
>
>
>> -----Original Message-----
>> From: Martina [mailto:boehme@mpiib-berlin.mpg.de]
>> Sent: Monday, June 20, 2005 6:21 PM
>> To: Richard HOLLAND
>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava;
>> biosql-l@open-bio.org
>> Subject: Re: [BioSQL-l] _removeSequence
>>
>>
>> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40
>> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE.
>> Do I need to do anything else?
>>
>> Thanks,
>> Martina
>>
>> Richard HOLLAND wrote:
>>
>>> To do cascading deletes in MySQL requires the tables to
>> have been set up
>>> using the InnoDB table style (as opposed to the default
>> MyISAM tables).
>>> In InnoDB, foreign keys are actually enforced and deletes
>> will cascade,
>>> whereas in MyISAM it has no concept of foreign keys and so
>> is unable to
>>> enforce data integrity. The people on the BioSQL-L mailing
>> list will be
>>> able to help you there.
>>>
>>> The next version of BioJava's database interfaces after the
>> 1.4 release
>>> will assume that the underlying database does have cascading deletes
>>> turned on. The existing version half-attempts to make up
>> for the lack of
>>> cascading deletes in databases that don't support it, but
>> it doesn't do
>>> it well at all, hence the problems you are seeing. After
>> consulting with
>>> Hilmar last week we decided it was a fair assumption to
>> make that all
>>> BioSQL instances are installed with cascading deletes enabled.
>>> BioPerl-db already makes this assumption.
>>>
>>> cheers,
>>> Richard
>>>
>>> Richard Holland
>>> Bioinformatics Specialist
>>> GIS extension 8199
>>> ---------------------------------------------
>>> This email is confidential and may be privileged. If you are not the
>>> intended recipient, please delete it and notify us
>> immediately. Please
>>> do not copy or use it for any purpose, or disclose its
>> content to any
>>> other person. Thank you.
>>> ---------------------------------------------
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: biosql-l-bounces@portal.open-bio.org
>>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of
>>>> mark.schreiber@novartis.com
>>>> Sent: Monday, June 20, 2005 5:57 PM
>>>> To: Martina
>>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava;
>>>> biosql-l@open-bio.org
>>>> Subject: Re: [BioSQL-l] _removeSequence
>>>>
>>>>
>>>> Biojava doesn't attempt to recusivley remove features by
>>>> itself. It relies
>>>> on cascading deletes in the database. I know Oracle can be
>>>> set to do this
>>>> (and it works very well). If MySQL has equivalent
>>>> functionality you may
>>>> need to turn it on. I'm pretty sure it does but you need to
>> set it up.
>>>>
>>>> - Mark
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Martina <boehme@mpiib-berlin.mpg.de>
>>>> Sent by: biosql-l-bounces@portal.open-bio.org
>>>> 06/20/2005 05:43 PM
>>>>
>>>>
>>>>        To:     biosql-l@open-bio.org, BioJava
>> <biojava-l@biojava.org>
>>>>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>>        Subject:        [BioSQL-l] _removeSequence
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Im trying to delete a sequence and recursivly all its features.
>>>>
>>>> So:
>>>>
>>>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
>>>>                 Sequence s = si.nextSequence();
>>>>                 String name = s.getName();
>>>>                 s = null;
>>>>                 db.removeSequence(name);
>>>> }
>>>>
>>>> But if I look in the database (MySQL  4.1.12) I can still
>> see plenty
>>>> of entries and I have problems entering the same features again,
>>>> because of dublicate key error. I would like to know if
>>>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove
>>>> features recursivly or just the features of the removed sequence?
>>>> If so - what is the best way do delete the features of the features
>>>> (and so on)? And how to empty the db completly?
>>>>
>>>> Martina
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l@open-bio.org
>>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> BioSQL-l mailing list
>>>> BioSQL-l@open-bio.org
>>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Mon Jun 20 15:33:11 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jun 20 15:23:39 2005
Subject: [BioSQL-l] circular
In-Reply-To: <OF0E024A83.023853AA-ON48257026.0036B03E-48257026.00371693@EU.novartis.net>
References: <OF0E024A83.023853AA-ON48257026.0036B03E-48257026.00371693@EU.novartis.net>
Message-ID: <06eb73cb04fc0adb0c8565ddae4e946b@gnf.org>

Interesting question.

I'd argue that the root question is whether the boolean property of 
circularity is best considered as an annotation of a bioentry with 
sequence, or as a core property of a biosequence.

Annotation generally is something that's applicable to some but not to 
other entries. A core property is something that can be well defined 
for (almost) all rows, and/or is necessary to define uniqueness or 
operations on the object.

Is_circular can certainly be defined for all biosequence rows. Also, in 
order to define operations like taking a subsequence of length 100 
starting 50bp before the end, knowing whether the sequence is circular 
makes a critical difference.

So, short-term you can store it as annotation (tag/value) on the 
bioentry, but long-term I think this needs to be added to the 
biosequence table as a column.

	-hilmar

On Jun 20, 2005, at 3:01 AM, mark.schreiber@novartis.com wrote:

> So 'is_circular' should be the blessed term. It really needs to be a
> convention so that reading and writing is consistent between bio*
> projects.
>
> Would it be a good idea for the sequence table of BioSQL 1.1 to have a
> circular column?
>
> - Mark
>
>
>
>
>
> "Marc Logghe" <Marc.Logghe@devgen.com>
> 06/20/2005 05:33 PM
>
>
>         To:     Mark Schreiber/GP/Novartis@PH, <biosql-l@open-bio.org>
>         cc:
>         Subject:        RE: [BioSQL-l] circular
>
>
> Hi Mark,
> As far as I am aware of, there is currently no field available in the
> bioentry table to store that kind of flag.
> It is parsed out from genbank files by BioPerl, though.
> It is taken from the genbank Locus line, eg.
> "LOCUS       BBPLAS                  2687 bp    DNA     circular BCT
> 12-MAR-1999"
> You can check the resulting Bio::Seq::RichSeq object by running the
> is_circular() method from Bio::PrimarySeq.
> A solution would be to make a Bio::Factory::SequenceProcessorI 
> compliant
> processor and pass that as an option to your load_seqdatabase.pl 
> script.
> In the procesor itself, you can for instance do the following:
> 1) check for circularity using the is_circular() method
> 2) if circular, add a term to your sequence object (eg. annotation 
> term,
> gene ontology term 'is_circular') indicating it is circular
>
> My 0.02$
>
> Cheers,
> Marc
>
>
>> -----Original Message-----
>> From: biosql-l-bounces@portal.open-bio.org
>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of
>> mark.schreiber@novartis.com
>> Sent: Monday, June 20, 2005 7:34 AM
>> To: biosql-l@open-bio.org
>> Subject: [BioSQL-l] circular
>>
>> Hello -
>>
>> When circular sequences (plasmids, bacterial genomes etc) are
>> stored in BioSQL how is their circularity indicated? Or, what
>> should the convention be?
>>
>> - Mark
>>
>> Mark Schreiber
>> Principal Scientist (Bioinformatics)
>>
>> Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road
>> #05-01 Chromos
>> Singapore 138670
>> www.nitd.novartis.com
>>
>> phone +65 6722 2973
>> fax  +65 6722 2910
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Mon Jun 20 15:39:24 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jun 20 15:29:49 2005
Subject: [BioSQL-l] bioentry-version vs sequence-version
In-Reply-To: <OFE89FABD3.3C1566BC-ON48257026.0024EF59-48257026.0025253C@EU.novartis.net>
References: <OFE89FABD3.3C1566BC-ON48257026.0024EF59-48257026.0025253C@EU.novartis.net>
Message-ID: <f33116c5b4aa01a3b2fc89fd97a93f92@gnf.org>

  From the schema-overview.txt:

Sequences may have their own version number, independent of its
bioentry version information.

This pretty much states it. Usually they will have the same version, 
but some data providers may choose to increment the version of the 
sequence whenever the sequence changes, and the version of the entry 
whenever the sequence or the annotation change.

	-hilmar

On Jun 19, 2005, at 11:45 PM, mark.schreiber@novartis.com wrote:

> Hello -
>
> Why do bioentry and sequence both have a version column? Sequence 
> records
> only exist in one to one relationships with their parent bioentry so
> surely they would inherit their version number from their parent 
> bioentry?
>
> - Mark
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Mon Jun 20 15:57:56 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jun 20 15:48:18 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <42B6DEC3.9090807@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
	<42B6DEC3.9090807@mpiib-berlin.mpg.de>
Message-ID: <f5bb76b54331dc88107ebde4bee3dc46@gnf.org>


On Jun 20, 2005, at 8:20 AM, Martina wrote:

> Hi,
>
> so I have this new database (still biosqldb-mysql.sqlv 1.40 2004/11/04 
> 01:49:41) and after removing all sequences, I do still have entries in 
> term, term_relationship,term_relationship_term and ontology. And of 
> course, in biodatabase. If I delete the entry in biodatabase too, 
> nothing changes. Is that what is to be expected?

Yes. Deletes cascade through foreign key constraints and nothing else. 
Term has a n:n relationship with bioentry and therefore does not have a 
foreign key to bioentry.

More generally, and provided cascading deletes are enabled, if you 
delete a row in a master table, all corresponding rows in detail tables 
are deleted that stand in a 1:n relationship to the master table (and 
therefore have a foreign key defined pointing to the master table). 
Rows in n:n related tables will not be deleted, but dissociated by 
deleting the corresponding rows from the association table. As 
examples, Comment and Biosequence are 1:n related to bioentry, whereas 
dbxref, reference, and term are n:n related, with bioentry_dbxref, 
bioentry_reference, and bioentry_qualifier_value being the association 
tables.

If this is confusing to you, you should read a general textbook on 
relational databases and normalization which usually will explain this 
a lot better than I do.

> Cause I still have trouble with the dublicate entry key, but that must 
> be my code then.

Yes. When you insert a sequence you must be prepared that when 
inserting its ontology term or tag/value annotation the term may 
already be present because another bioentry uses it too. Similarly for 
Reference and Dbxref (although I believe Biojava doesn't use these - 
yet).

	-hilmar

>
> Thanks
> Martina
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Mon Jun 20 16:05:26 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jun 20 15:55:30 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <42B69875.3050306@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
	<42B69875.3050306@mpiib-berlin.mpg.de>
Message-ID: <3b0bdefb15e41a8a9020e2ffdf3e1312@gnf.org>

You should actually check whether InnoDB is enabled in your instance of 
Mysql. Mysql has the "nice" behaviour of silently converting the table 
manager to MyISAM if InnoDB has not been enabled in the instance. It 
will not throw an error.

Up until at least 4.0.x InnoDB was disabled by default. You can check 
whether it is enabled by issuing

	mysql> show variables;

and then look for the have_innodb variable. It needs to have the value 
of YES. The variables with innodb_ prefix will tell you where it 
creates its tablespaces etc.

If it is not enabled, you need to edit Mysql's config file accordingly 
and restart the Mysql daemon.

	-hilmar

On Jun 20, 2005, at 3:20 AM, Martina wrote:

> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 
> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE.
> Do I need to do anything else?
>
> Thanks,
> Martina
>
> Richard HOLLAND wrote:
>
>> To do cascading deletes in MySQL requires the tables to have been set 
>> up
>> using the InnoDB table style (as opposed to the default MyISAM 
>> tables).
>> In InnoDB, foreign keys are actually enforced and deletes will 
>> cascade,
>> whereas in MyISAM it has no concept of foreign keys and so is unable 
>> to
>> enforce data integrity. The people on the BioSQL-L mailing list will 
>> be
>> able to help you there.
>> The next version of BioJava's database interfaces after the 1.4 
>> release
>> will assume that the underlying database does have cascading deletes
>> turned on. The existing version half-attempts to make up for the lack 
>> of
>> cascading deletes in databases that don't support it, but it doesn't 
>> do
>> it well at all, hence the problems you are seeing. After consulting 
>> with
>> Hilmar last week we decided it was a fair assumption to make that all
>> BioSQL instances are installed with cascading deletes enabled.
>> BioPerl-db already makes this assumption.
>> cheers,
>> Richard
>> Richard Holland
>> Bioinformatics Specialist
>> GIS extension 8199
>> ---------------------------------------------
>> This email is confidential and may be privileged. If you are not the
>> intended recipient, please delete it and notify us immediately. Please
>> do not copy or use it for any purpose, or disclose its content to any
>> other person. Thank you.
>> ---------------------------------------------
>>> -----Original Message-----
>>> From: biosql-l-bounces@portal.open-bio.org 
>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of 
>>> mark.schreiber@novartis.com
>>> Sent: Monday, June 20, 2005 5:57 PM
>>> To: Martina
>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; 
>>> biosql-l@open-bio.org
>>> Subject: Re: [BioSQL-l] _removeSequence
>>>
>>>
>>> Biojava doesn't attempt to recusivley remove features by itself. It 
>>> relies on cascading deletes in the database. I know Oracle can be 
>>> set to do this (and it works very well). If MySQL has equivalent 
>>> functionality you may need to turn it on. I'm pretty sure it does 
>>> but you need to set it up.
>>>
>>> - Mark
>>>
>>>
>>>
>>>
>>>
>>> Martina <boehme@mpiib-berlin.mpg.de>
>>> Sent by: biosql-l-bounces@portal.open-bio.org
>>> 06/20/2005 05:43 PM
>>>
>>>        To:     biosql-l@open-bio.org, BioJava <biojava-l@biojava.org>
>>>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>        Subject:        [BioSQL-l] _removeSequence
>>>
>>>
>>> Hi,
>>>
>>> Im trying to delete a sequence and recursivly all its features.
>>>
>>> So:
>>>
>>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
>>>                 Sequence s = si.nextSequence();
>>>                 String name = s.getName();
>>>                 s = null;
>>>                 db.removeSequence(name);
>>> }
>>>
>>> But if I look in the database (MySQL  4.1.12) I can still see plenty 
>>> of entries and I have problems entering the same features again, 
>>> because of dublicate key error. I would like to know if 
>>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove 
>>> features recursivly or just the features of the removed sequence?
>>> If so - what is the best way do delete the features of the features 
>>> (and so on)? And how to empty the db completly?
>>>
>>> Martina
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From boehme at mpiib-berlin.mpg.de  Tue Jun 21 05:46:22 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Tue Jun 21 05:38:04 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <78e39420822012ffbf691b5edc233b4a@gnf.org>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com>
	<78e39420822012ffbf691b5edc233b4a@gnf.org>
Message-ID: <42B7E1EE.5090505@mpiib-berlin.mpg.de>

Hi Hilmar,

I wasn't aware of 2 different types of features.
I'm making features as described in 
http://www.biojava.org/docs/bj_in_anger/feature.htm, and as far as I 
can tell from the results, its the first type you describe.
The second type of feature is confusing me: as I understood the 
feature relationships, the graph is a tree, with only one parent for a 
given feature, and if that feature is deleted, all its children should 
get deleted too?

Martina


Hilmar Lapp wrote:

> There's one thing that I'm unsure about in Martina's original email, 
> namely whether she was referring to features related to a sequence 
> (bioentry), or to features hierarchically related to each other through 
> the seqfeature_relationship table.
> 
> If the former, then the cascading delete should have taken care of 
> removing the features when you remove the sequence (bioentry) to which 
> they point through their foreign key (and recursively the locations etc).
> 
> However, if the question was about hierarchical features, then deleting 
> one feature in the hierarchy will never (and shouldn't ever) delete any 
> other feature in the hierarchy (except if all of them reference the same 
> bioentry and you deleted the bioentry). If you delete a seqfeature in a 
> hierarchy of seqfeatures then by cascading delete this will also delete 
> all rows in seqfeature_relationship that reference that seqfeature as 
> either a subject or an object in a nesting relationship between 
> features. I.e., looking at the hierarchy as a graph, removing a node 
> will cascade to deleting all incoming and outgoing arcs for that node, 
> but not other nodes.
> 
> If your application wants to take down all nodes in the hierarchy when 
> one node is deleted, you need to write code to do this. (Except if, as 
> mentioned before, all features reference the same bioentry, in which 
> case deleting the bioentry will delete the entire feature hierarchy.)
> 
>     -hilmar
> 
> On Jun 20, 2005, at 3:33 AM, Richard HOLLAND wrote:
> 
>> Well, technically that should work because BioJava simply issues a
>> delete against the seqfeature table, and therefore all features related
>> through foreign keys should automatically delete themselves as a result
>> without any further intervention by BioJava... beats me why it doesn't!
>> Unfortunately I don't currently use the MySQL implementation myself so I
>> can't help much. I hope someone on BioSQL-L knows a little more?
>>
>> Richard Holland
>> Bioinformatics Specialist
>> GIS extension 8199
>> ---------------------------------------------
>> This email is confidential and may be privileged. If you are not the
>> intended recipient, please delete it and notify us immediately. Please
>> do not copy or use it for any purpose, or disclose its content to any
>> other person. Thank you.
>> ---------------------------------------------
>>
>>
>>> -----Original Message-----
>>> From: Martina [mailto:boehme@mpiib-berlin.mpg.de]
>>> Sent: Monday, June 20, 2005 6:21 PM
>>> To: Richard HOLLAND
>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava;
>>> biosql-l@open-bio.org
>>> Subject: Re: [BioSQL-l] _removeSequence
>>>
>>>
>>> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40
>>> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE.
>>> Do I need to do anything else?
>>>
>>> Thanks,
>>> Martina
>>>
>>> Richard HOLLAND wrote:
>>>
>>>> To do cascading deletes in MySQL requires the tables to
>>>
>>> have been set up
>>>
>>>> using the InnoDB table style (as opposed to the default
>>>
>>> MyISAM tables).
>>>
>>>> In InnoDB, foreign keys are actually enforced and deletes
>>>
>>> will cascade,
>>>
>>>> whereas in MyISAM it has no concept of foreign keys and so
>>>
>>> is unable to
>>>
>>>> enforce data integrity. The people on the BioSQL-L mailing
>>>
>>> list will be
>>>
>>>> able to help you there.
>>>>
>>>> The next version of BioJava's database interfaces after the
>>>
>>> 1.4 release
>>>
>>>> will assume that the underlying database does have cascading deletes
>>>> turned on. The existing version half-attempts to make up
>>>
>>> for the lack of
>>>
>>>> cascading deletes in databases that don't support it, but
>>>
>>> it doesn't do
>>>
>>>> it well at all, hence the problems you are seeing. After
>>>
>>> consulting with
>>>
>>>> Hilmar last week we decided it was a fair assumption to
>>>
>>> make that all
>>>
>>>> BioSQL instances are installed with cascading deletes enabled.
>>>> BioPerl-db already makes this assumption.
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> Richard Holland
>>>> Bioinformatics Specialist
>>>> GIS extension 8199
>>>> ---------------------------------------------
>>>> This email is confidential and may be privileged. If you are not the
>>>> intended recipient, please delete it and notify us
>>>
>>> immediately. Please
>>>
>>>> do not copy or use it for any purpose, or disclose its
>>>
>>> content to any
>>>
>>>> other person. Thank you.
>>>> ---------------------------------------------
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: biosql-l-bounces@portal.open-bio.org
>>>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of
>>>>> mark.schreiber@novartis.com
>>>>> Sent: Monday, June 20, 2005 5:57 PM
>>>>> To: Martina
>>>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava;
>>>>> biosql-l@open-bio.org
>>>>> Subject: Re: [BioSQL-l] _removeSequence
>>>>>
>>>>>
>>>>> Biojava doesn't attempt to recusivley remove features by
>>>>> itself. It relies
>>>>> on cascading deletes in the database. I know Oracle can be
>>>>> set to do this
>>>>> (and it works very well). If MySQL has equivalent
>>>>> functionality you may
>>>>> need to turn it on. I'm pretty sure it does but you need to
>>>
>>> set it up.
>>>
>>>>>
>>>>> - Mark
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Martina <boehme@mpiib-berlin.mpg.de>
>>>>> Sent by: biosql-l-bounces@portal.open-bio.org
>>>>> 06/20/2005 05:43 PM
>>>>>
>>>>>
>>>>>        To:     biosql-l@open-bio.org, BioJava
>>>
>>> <biojava-l@biojava.org>
>>>
>>>>>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>>>        Subject:        [BioSQL-l] _removeSequence
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Im trying to delete a sequence and recursivly all its features.
>>>>>
>>>>> So:
>>>>>
>>>>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
>>>>>                 Sequence s = si.nextSequence();
>>>>>                 String name = s.getName();
>>>>>                 s = null;
>>>>>                 db.removeSequence(name);
>>>>> }
>>>>>
>>>>> But if I look in the database (MySQL  4.1.12) I can still
>>>
>>> see plenty
>>>
>>>>> of entries and I have problems entering the same features again,
>>>>> because of dublicate key error. I would like to know if
>>>>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove
>>>>> features recursivly or just the features of the removed sequence?
>>>>> If so - what is the best way do delete the features of the features
>>>>> (and so on)? And how to empty the db completly?
>>>>>
>>>>> Martina
>>>>>
>>>>> _______________________________________________
>>>>> BioSQL-l mailing list
>>>>> BioSQL-l@open-bio.org
>>>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> BioSQL-l mailing list
>>>>> BioSQL-l@open-bio.org
>>>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>>
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
From boehme at mpiib-berlin.mpg.de  Tue Jun 21 06:10:16 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Tue Jun 21 06:02:43 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <f5bb76b54331dc88107ebde4bee3dc46@gnf.org>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
	<42B6DEC3.9090807@mpiib-berlin.mpg.de>
	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>
Message-ID: <42B7E788.3040205@mpiib-berlin.mpg.de>


> Yes. When you insert a sequence you must be prepared that when inserting 
> its ontology term or tag/value annotation the term may already be 
> present because another bioentry uses it too.

Ok, the proper way is to catch the SQLException in BIOSQLFeature, test 
if it is a Dublicate key entry, get the identifier of the term (would 
that be the BioSQLfeatureId ?) and insert it in the term_relationship 
table? And there is no nice BioJava method for this, I have to do it 
"manually", like conn.prepareStatement(..) and stuff?  BioJava spoiled 
me so!

Martina
From hlapp at gnf.org  Tue Jun 21 06:17:42 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jun 21 06:10:43 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <42B7E1EE.5090505@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com>
	<78e39420822012ffbf691b5edc233b4a@gnf.org>
	<42B7E1EE.5090505@mpiib-berlin.mpg.de>
Message-ID: <b91b9e11939b2ba4aca079c339ad9666@gnf.org>


On Jun 21, 2005, at 2:46 AM, Martina wrote:

> Hi Hilmar,
>
> I wasn't aware of 2 different types of features.
> I'm making features as described in 
> http://www.biojava.org/docs/bj_in_anger/feature.htm, and as far as I 
> can tell from the results, its the first type you describe.

No this is not different types of features; it's only whether the 
features are nested or not.

> The second type of feature is confusing me: as I understood the 
> feature relationships, the graph is a tree, with only one parent for a 
> given feature

I'm not sure whether Biojava imposes this as a limitation, but Biosql 
certainly doesn't since it assumes a n:n relationship. In reality, 
nested features compliant with SO/SOFA will be trees though, I believe.

> , and if that feature is deleted, all its children should get deleted 
> too?

No, as I said below. To be more precise, not by the mechanism of 
cascading deletes (remember: cascading deletes only follow foreign key 
constraints - and a feature doesn't have a foreign key to another one). 
Your software or Biojava may implement it the way you suggested, but no 
RDBMS is going to do this for you.

	-hilmar

>
> Martina
>
>
> Hilmar Lapp wrote:
>
>> There's one thing that I'm unsure about in Martina's original email, 
>> namely whether she was referring to features related to a sequence 
>> (bioentry), or to features hierarchically related to each other 
>> through the seqfeature_relationship table.
>> If the former, then the cascading delete should have taken care of 
>> removing the features when you remove the sequence (bioentry) to 
>> which they point through their foreign key (and recursively the 
>> locations etc).
>> However, if the question was about hierarchical features, then 
>> deleting one feature in the hierarchy will never (and shouldn't ever) 
>> delete any other feature in the hierarchy (except if all of them 
>> reference the same bioentry and you deleted the bioentry). If you 
>> delete a seqfeature in a hierarchy of seqfeatures then by cascading 
>> delete this will also delete all rows in seqfeature_relationship that 
>> reference that seqfeature as either a subject or an object in a 
>> nesting relationship between features. I.e., looking at the hierarchy 
>> as a graph, removing a node will cascade to deleting all incoming and 
>> outgoing arcs for that node, but not other nodes.
>> If your application wants to take down all nodes in the hierarchy 
>> when one node is deleted, you need to write code to do this. (Except 
>> if, as mentioned before, all features reference the same bioentry, in 
>> which case deleting the bioentry will delete the entire feature 
>> hierarchy.)
>>     -hilmar
>> On Jun 20, 2005, at 3:33 AM, Richard HOLLAND wrote:
>>> Well, technically that should work because BioJava simply issues a
>>> delete against the seqfeature table, and therefore all features 
>>> related
>>> through foreign keys should automatically delete themselves as a 
>>> result
>>> without any further intervention by BioJava... beats me why it 
>>> doesn't!
>>> Unfortunately I don't currently use the MySQL implementation myself 
>>> so I
>>> can't help much. I hope someone on BioSQL-L knows a little more?
>>>
>>> Richard Holland
>>> Bioinformatics Specialist
>>> GIS extension 8199
>>> ---------------------------------------------
>>> This email is confidential and may be privileged. If you are not the
>>> intended recipient, please delete it and notify us immediately. 
>>> Please
>>> do not copy or use it for any purpose, or disclose its content to any
>>> other person. Thank you.
>>> ---------------------------------------------
>>>
>>>
>>>> -----Original Message-----
>>>> From: Martina [mailto:boehme@mpiib-berlin.mpg.de]
>>>> Sent: Monday, June 20, 2005 6:21 PM
>>>> To: Richard HOLLAND
>>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava;
>>>> biosql-l@open-bio.org
>>>> Subject: Re: [BioSQL-l] _removeSequence
>>>>
>>>>
>>>> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 
>>>> 1.40
>>>> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE.
>>>> Do I need to do anything else?
>>>>
>>>> Thanks,
>>>> Martina
>>>>
>>>> Richard HOLLAND wrote:
>>>>
>>>>> To do cascading deletes in MySQL requires the tables to
>>>>
>>>> have been set up
>>>>
>>>>> using the InnoDB table style (as opposed to the default
>>>>
>>>> MyISAM tables).
>>>>
>>>>> In InnoDB, foreign keys are actually enforced and deletes
>>>>
>>>> will cascade,
>>>>
>>>>> whereas in MyISAM it has no concept of foreign keys and so
>>>>
>>>> is unable to
>>>>
>>>>> enforce data integrity. The people on the BioSQL-L mailing
>>>>
>>>> list will be
>>>>
>>>>> able to help you there.
>>>>>
>>>>> The next version of BioJava's database interfaces after the
>>>>
>>>> 1.4 release
>>>>
>>>>> will assume that the underlying database does have cascading 
>>>>> deletes
>>>>> turned on. The existing version half-attempts to make up
>>>>
>>>> for the lack of
>>>>
>>>>> cascading deletes in databases that don't support it, but
>>>>
>>>> it doesn't do
>>>>
>>>>> it well at all, hence the problems you are seeing. After
>>>>
>>>> consulting with
>>>>
>>>>> Hilmar last week we decided it was a fair assumption to
>>>>
>>>> make that all
>>>>
>>>>> BioSQL instances are installed with cascading deletes enabled.
>>>>> BioPerl-db already makes this assumption.
>>>>>
>>>>> cheers,
>>>>> Richard
>>>>>
>>>>> Richard Holland
>>>>> Bioinformatics Specialist
>>>>> GIS extension 8199
>>>>> ---------------------------------------------
>>>>> This email is confidential and may be privileged. If you are not 
>>>>> the
>>>>> intended recipient, please delete it and notify us
>>>>
>>>> immediately. Please
>>>>
>>>>> do not copy or use it for any purpose, or disclose its
>>>>
>>>> content to any
>>>>
>>>>> other person. Thank you.
>>>>> ---------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: biosql-l-bounces@portal.open-bio.org
>>>>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of
>>>>>> mark.schreiber@novartis.com
>>>>>> Sent: Monday, June 20, 2005 5:57 PM
>>>>>> To: Martina
>>>>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava;
>>>>>> biosql-l@open-bio.org
>>>>>> Subject: Re: [BioSQL-l] _removeSequence
>>>>>>
>>>>>>
>>>>>> Biojava doesn't attempt to recusivley remove features by
>>>>>> itself. It relies
>>>>>> on cascading deletes in the database. I know Oracle can be
>>>>>> set to do this
>>>>>> (and it works very well). If MySQL has equivalent
>>>>>> functionality you may
>>>>>> need to turn it on. I'm pretty sure it does but you need to
>>>>
>>>> set it up.
>>>>
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Martina <boehme@mpiib-berlin.mpg.de>
>>>>>> Sent by: biosql-l-bounces@portal.open-bio.org
>>>>>> 06/20/2005 05:43 PM
>>>>>>
>>>>>>
>>>>>>        To:     biosql-l@open-bio.org, BioJava
>>>>
>>>> <biojava-l@biojava.org>
>>>>
>>>>>>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>>>>>>        Subject:        [BioSQL-l] _removeSequence
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Im trying to delete a sequence and recursivly all its features.
>>>>>>
>>>>>> So:
>>>>>>
>>>>>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) {
>>>>>>                 Sequence s = si.nextSequence();
>>>>>>                 String name = s.getName();
>>>>>>                 s = null;
>>>>>>                 db.removeSequence(name);
>>>>>> }
>>>>>>
>>>>>> But if I look in the database (MySQL  4.1.12) I can still
>>>>
>>>> see plenty
>>>>
>>>>>> of entries and I have problems entering the same features again,
>>>>>> because of dublicate key error. I would like to know if
>>>>>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove
>>>>>> features recursivly or just the features of the removed sequence?
>>>>>> If so - what is the best way do delete the features of the 
>>>>>> features
>>>>>> (and so on)? And how to empty the db completly?
>>>>>>
>>>>>> Martina
>>>>>>
>>>>>> _______________________________________________
>>>>>> BioSQL-l mailing list
>>>>>> BioSQL-l@open-bio.org
>>>>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> BioSQL-l mailing list
>>>>>> BioSQL-l@open-bio.org
>>>>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gnf.org  Tue Jun 21 06:21:33 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jun 21 06:13:53 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <42B7E788.3040205@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
	<42B6DEC3.9090807@mpiib-berlin.mpg.de>
	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>
	<42B7E788.3040205@mpiib-berlin.mpg.de>
Message-ID: <0be3992b92f6a14b6d06d5a06549555b@gnf.org>

The Biojava people will respond to this. Note though that 
Term_Relationship is for storing subject-predicate-object triples of 
terms, so I'm not sure why you want to use it for storing/associating 
annotation. Maybe you meant bioentry_qualifier_value?

	-hilmar

On Jun 21, 2005, at 3:10 AM, Martina wrote:

>
>> Yes. When you insert a sequence you must be prepared that when 
>> inserting its ontology term or tag/value annotation the term may 
>> already be present because another bioentry uses it too.
>
> Ok, the proper way is to catch the SQLException in BIOSQLFeature, test 
> if it is a Dublicate key entry, get the identifier of the term (would 
> that be the BioSQLfeatureId ?) and insert it in the term_relationship 
> table? And there is no nice BioJava method for this, I have to do it 
> "manually", like conn.prepareStatement(..) and stuff?  BioJava spoiled 
> me so!
>
> Martina
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From jana.bauckmann at informatik.hu-berlin.de  Tue Jun 21 08:15:01 2005
From: jana.bauckmann at informatik.hu-berlin.de (Jana Bauckmann)
Date: Tue Jun 21 08:08:49 2005
Subject: [BioSQL-l] Re: memory error while loading SwissProt into Oracle
	using bioperl-db
In-Reply-To: <3ba087a1f2d128f023b94d871b0366fa@gnf.org>
Message-ID: <Pine.GSO.4.33.0506211412510.25411-100000@rabe>

Hi,

I solved my problems to import SwissProt. It turned out to be a mixture of
reasons -- so I thought it could be interesting for you:

1) An upgrade to BioPerl 1.5 solved my problems with integrity constraint
errors.

2) I got a memory leak with DBD::Oracle, Oracle 9.2 and multi-thread
enabled perl 5.8.1 -- as you assumed. (The used memory growed up to 2GB
while inserting 30000 records.) I installed perl as multi-thread disabled
version and everything worked fine.

Thank you very much,
Jana


On Tue, 14 Jun 2005, Hilmar Lapp wrote:

>
> On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote:
>
> > Hi,
> >
> > I would like to load SwissProt data into my Oracle 9.2 database with
> > BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got
> > two
> > problems:
> >
> > 1) I get many (about 1300) warnings stating integrity constraint
> > errors:
> >
> > ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - parent
> > key not found (DBD ERROR: OCIStmtExecute)
> >
> > ORA-01400: cannot insert NULL into
> > ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS")
> > (DBD ERROR: OCIStmtExecute)
>
> If there is indeed no authors for the respective reference in the
> respective SwissProt entries then this is expected because
> Reference.Authors may not be NULL.
>
> You should, however, see more than just the error message above;
> supposedly there is a warning message following or preceding it that
> informs about not all foreign keys succeeded to insert, and the message
> should give the primary key. This should be the primary key for the
> bioentry that should have gotten the reference attached. Using SQL you
> should then be able to identify which record it is and then you can
> look it up on the Swissprot site or in your Swissprot source file.
>
> If the bioentry itself fails to load because of this problem then you
> should see an error message to this effect, with full stack trace.
> Otherwise the bioentry did load, just the reference didn't, and if you
> don't really need this particular reference, you don't need to worry
> about it.
>
> You may also want to consider trying to upgrade to a CVS snapshot from
> either the 1.4 branch or the main trunk. There have been a few fixes to
> modules that I believe include the swissprot parser.
>
> >
> > 2) The script stops after 2 hours (34500 tuples in table BioEntry) with
> > message: Out of memory!
> >
> > I guess problem 1 causes problem 2. Is this reasonable or do I have two
> > separated problems?
>
> The one before may not even be a real problem, see above. It is
> extremely unlikely that it causes the memory problem.
>
> Swissprot is is a large, very diverse, and richly annotated data
> source, and because bioperl-db caches a lot of stuff like ontology
> terms, references, and dbxrefs the loader process will eventually use
> up anywhere between 500MB and 1.3GB of RAM.
>
> Given the amount of memory you have this shouldn't be a limitation
> though at all, unless maybe if you gave all the memory to Oracle
> running on the same machine.
>
> I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client
> library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be
> seeing a similar problem. Try watching the loader process in top and
> see how fast the memory consumption grows. It will grow due to the
> object cache filling up, but if you see it eating up more than 1GB
> before 100,000 records loaded you're likely to have hit a memory leak.
>
> If that's the case you'll have to rebuild your own perl from source
> with multi-threading disabled.
>
> 	-hilmar
>
> >
> > I run Oracle and the load script on the same machine with:
> > Suse Linux 9.0 (kernel 2.4.21-291-smp) with  12 GB RAM
> > perl 5.8.1, built for i586-linux-thread-multi
> > bioperl 1.4
> > bioperl-db 0.1
>
> BTW I'm assuming this is not correct; otherwise the latest BioSQL
> schema wouldn't be supported, let alone the Oracle version of it. You
> probably obtained a snapshot from CVS?
>
> > DBI 1.48
> > DBD::Oracle 1.16
> > Oracle 9.2
> > BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on
> > 6th
> > June 2005)
> >
> > Thanks for any suggestions,
> > Jana
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l@open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>

From boehme at mpiib-berlin.mpg.de  Tue Jun 21 09:55:15 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Tue Jun 21 09:51:10 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <0be3992b92f6a14b6d06d5a06549555b@gnf.org>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
	<42B6DEC3.9090807@mpiib-berlin.mpg.de>
	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>
	<42B7E788.3040205@mpiib-berlin.mpg.de>
	<0be3992b92f6a14b6d06d5a06549555b@gnf.org>
Message-ID: <42B81C43.9010404@mpiib-berlin.mpg.de>

That means, that I can't have 2 features refering to the same bioentry 
with the same type (= type_term_id)and source (=source_term_id) but 
different parent features because of the composite key bioentry_id in 
the seqfeature table? Or what does "rank" in that table mean (its part 
of that key), how can I get different ranks?

Martina

Hilmar Lapp wrote:

> The Biojava people will respond to this. Note though that 
> Term_Relationship is for storing subject-predicate-object triples of 
> terms, so I'm not sure why you want to use it for storing/associating 
> annotation. Maybe you meant bioentry_qualifier_value?
> 
>     -hilmar
> 
> On Jun 21, 2005, at 3:10 AM, Martina wrote:
> 
>>
>>> Yes. When you insert a sequence you must be prepared that when 
>>> inserting its ontology term or tag/value annotation the term may 
>>> already be present because another bioentry uses it too.
>>
>>
>> Ok, the proper way is to catch the SQLException in BIOSQLFeature, test 
>> if it is a Dublicate key entry, get the identifier of the term (would 
>> that be the BioSQLfeatureId ?) and insert it in the term_relationship 
>> table? And there is no nice BioJava method for this, I have to do it 
>> "manually", like conn.prepareStatement(..) and stuff?  BioJava spoiled 
>> me so!
>>
>> Martina
>>
From hlapp at gnf.org  Tue Jun 21 14:32:47 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jun 21 14:22:40 2005
Subject: [BioSQL-l] Re: memory error while loading SwissProt into Oracle
	using bioperl-db
In-Reply-To: <Pine.GSO.4.33.0506211412510.25411-100000@rabe>
References: <Pine.GSO.4.33.0506211412510.25411-100000@rabe>
Message-ID: <ab3d5643584104736aa585e16b81701b@gnf.org>

Good to know that the memory leak is not constrained to MacOSX. BTW 
aside from using a multi-threading disabled perl, I could also get rid 
of the memory leak by using the Instant Client from Oracle (which is 
10g, but will connect fine to a 9i database). Again, that's on MacOSX 
but chances are it will have the same effect for you.

	-hilmar

On Jun 21, 2005, at 5:15 AM, Jana Bauckmann wrote:

> Hi,
>
> I solved my problems to import SwissProt. It turned out to be a 
> mixture of
> reasons -- so I thought it could be interesting for you:
>
> 1) An upgrade to BioPerl 1.5 solved my problems with integrity 
> constraint
> errors.
>
> 2) I got a memory leak with DBD::Oracle, Oracle 9.2 and multi-thread
> enabled perl 5.8.1 -- as you assumed. (The used memory growed up to 2GB
> while inserting 30000 records.) I installed perl as multi-thread 
> disabled
> version and everything worked fine.
>
> Thank you very much,
> Jana
>
>
> On Tue, 14 Jun 2005, Hilmar Lapp wrote:
>
>>
>> On Jun 14, 2005, at 2:52 AM, Jana Bauckmann wrote:
>>
>>> Hi,
>>>
>>> I would like to load SwissProt data into my Oracle 9.2 database with
>>> BioSQL as schema using load_seqdatabase.pl from bioperl-db. I've got
>>> two
>>> problems:
>>>
>>> 1) I get many (about 1300) warnings stating integrity constraint
>>> errors:
>>>
>>> ORA-02291: integrity constraint (BIOSQL_SP.FKDBX_REF) violated - 
>>> parent
>>> key not found (DBD ERROR: OCIStmtExecute)
>>>
>>> ORA-01400: cannot insert NULL into
>>> ("BIOSQL_SP"."SG_REFERENCE"."AUTHORS")
>>> (DBD ERROR: OCIStmtExecute)
>>
>> If there is indeed no authors for the respective reference in the
>> respective SwissProt entries then this is expected because
>> Reference.Authors may not be NULL.
>>
>> You should, however, see more than just the error message above;
>> supposedly there is a warning message following or preceding it that
>> informs about not all foreign keys succeeded to insert, and the 
>> message
>> should give the primary key. This should be the primary key for the
>> bioentry that should have gotten the reference attached. Using SQL you
>> should then be able to identify which record it is and then you can
>> look it up on the Swissprot site or in your Swissprot source file.
>>
>> If the bioentry itself fails to load because of this problem then you
>> should see an error message to this effect, with full stack trace.
>> Otherwise the bioentry did load, just the reference didn't, and if you
>> don't really need this particular reference, you don't need to worry
>> about it.
>>
>> You may also want to consider trying to upgrade to a CVS snapshot from
>> either the 1.4 branch or the main trunk. There have been a few fixes 
>> to
>> modules that I believe include the swissprot parser.
>>
>>>
>>> 2) The script stops after 2 hours (34500 tuples in table BioEntry) 
>>> with
>>> message: Out of memory!
>>>
>>> I guess problem 1 causes problem 2. Is this reasonable or do I have 
>>> two
>>> separated problems?
>>
>> The one before may not even be a real problem, see above. It is
>> extremely unlikely that it causes the memory problem.
>>
>> Swissprot is is a large, very diverse, and richly annotated data
>> source, and because bioperl-db caches a lot of stuff like ontology
>> terms, references, and dbxrefs the loader process will eventually use
>> up anywhere between 500MB and 1.3GB of RAM.
>>
>> Given the amount of memory you have this shouldn't be a limitation
>> though at all, unless maybe if you gave all the memory to Oracle
>> running on the same machine.
>>
>> I've had a memory leak issue with DBD::Oracle, the Oracle 9iR2 client
>> library, and multi-thread enabled perl 5.8.1 on MacOSX. You may be
>> seeing a similar problem. Try watching the loader process in top and
>> see how fast the memory consumption grows. It will grow due to the
>> object cache filling up, but if you see it eating up more than 1GB
>> before 100,000 records loaded you're likely to have hit a memory leak.
>>
>> If that's the case you'll have to rebuild your own perl from source
>> with multi-threading disabled.
>>
>> 	-hilmar
>>
>>>
>>> I run Oracle and the load script on the same machine with:
>>> Suse Linux 9.0 (kernel 2.4.21-291-smp) with  12 GB RAM
>>> perl 5.8.1, built for i586-linux-thread-multi
>>> bioperl 1.4
>>> bioperl-db 0.1
>>
>> BTW I'm assuming this is not correct; otherwise the latest BioSQL
>> schema wouldn't be supported, let alone the Oracle version of it. You
>> probably obtained a snapshot from CVS?
>>
>>> DBI 1.48
>>> DBD::Oracle 1.16
>>> Oracle 9.2
>>> BioSQL schema for Oracle (downloaded from http://cvs.open-bio.org/ on
>>> 6th
>>> June 2005)
>>>
>>> Thanks for any suggestions,
>>> Jana
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l@open-bio.org
>>> http://open-bio.org/mailman/listinfo/biosql-l
>>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From hlapp at gnf.org  Tue Jun 21 15:47:49 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jun 21 15:37:45 2005
Subject: [BioSQL-l] _removeSequence
In-Reply-To: <42B7EC3C.60100@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
	<42B6DEC3.9090807@mpiib-berlin.mpg.de>
	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>
	<42B7E788.3040205@mpiib-berlin.mpg.de>
	<0be3992b92f6a14b6d06d5a06549555b@gnf.org>
	<42B7EC3C.60100@mpiib-berlin.mpg.de>
Message-ID: <69b3e884d800350b04e714de631e4d26@gnf.org>

As for documentation of the schema, there is an ERD (in PDF format) and 
a schema-overview.txt in the biosql-schema/doc directory.

I'll also be adding a version of the Biojava-in-anger document revised 
by Richard Holland, but that only deals with installation so won't help 
you much once you're beyond that point.

As to what goes where with respect to which part of the entries in 
which datasource goes to which tables in the biosql schema, that's a 
more involved question because data sources are different already, and 
because Biojava and Bioperl do things mostly different and incompatible 
right now, and because the exact mapping is not written down somewhere 
explicitly but more or less implicit from the way the bioperl SeqIO 
parsers work and how the bioperl RichSeq object (which is the object 
returned by most bioperl parsers) stores attributes as annotation. 
Richard, Mark, and I discussed this in Singapore a week ago and how the 
situation can be improved.

	-hilmar

On Jun 21, 2005, at 3:30 AM, Martina wrote:

> Well -  I'm not familiar with the BioSQL structure because BioJava did 
> it all for me. But if I have to, I'll look into it. The best 
> documentation are the comments in the *.sql file? Or how do I find out 
> where things go into?
>
> Martina
>
> Hilmar Lapp wrote:
>
>> Note though that Term_Relationship is for storing 
>> subject-predicate-object triples of terms, so I'm not sure why you 
>> want to use it for storing/associating annotation. Maybe you meant 
>> bioentry_qualifier_value?
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From boehme at mpiib-berlin.mpg.de  Wed Jun 22 05:24:08 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Wed Jun 22 05:16:23 2005
Subject: [BioSQL-l] update seqfeature 
In-Reply-To: <42B83D31.2000403@nrc-cnrc.gc.ca>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>	<42B6DEC3.9090807@mpiib-berlin.mpg.de>	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>	<42B7E788.3040205@mpiib-berlin.mpg.de>	<0be3992b92f6a14b6d06d5a06549555b@gnf.org>
	<42B81C43.9010404@mpiib-berlin.mpg.de>
	<42B83D31.2000403@nrc-cnrc.gc.ca>
Message-ID: <42B92E38.2020008@mpiib-berlin.mpg.de>

Hi Simon,

I'm changing the FeatureSource and in setFeatureSource an update on 
the source_term_id happens. In the case the combination is already 
there, I get an Exception. The proper way to deal with that would be 
to get the seqfeature_id of the entry already there and use that, or 
try to update the rank unless its a unique combination? Or should I 
rather not mess with the BioJava and delete that entry and insert it 
as new to let BioJava handle the rank increase?

Thanks for any advise

Martina

Simon Foote wrote:

> Hi Martina,
> 
> In fact you can, as rank is the field that allows this to happen.  In 
> Biojava, currently it's just a linearily incremented number such that 
> you can have the same type and source IDs for a given bioentry.
> 
> For example, adding a Genbank entry with 10 CDS features for 1 bioentry 
> will give you identical keys for bioentry_id, type_term_id and 
> source_term_id, but will have a rank of 1 - 10 for each.
> 
> Simon
> 

From boehme at mpiib-berlin.mpg.de  Wed Jun 22 09:05:44 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Wed Jun 22 08:57:26 2005
Subject: [BioSQL-l] Re: update seqfeature
In-Reply-To: <42B95EBF.7050403@nrc-cnrc.gc.ca>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>	<42B6DEC3.9090807@mpiib-berlin.mpg.de>	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>	<42B7E788.3040205@mpiib-berlin.mpg.de>	<0be3992b92f6a14b6d06d5a06549555b@gnf.org>
	<42B81C43.9010404@mpiib-berlin.mpg.de>
	<42B83D31.2000403@nrc-cnrc.gc.ca>
	<42B92E38.2020008@mpiib-berlin.mpg.de>
	<42B95EBF.7050403@nrc-cnrc.gc.ca>
Message-ID: <42B96228.4020100@mpiib-berlin.mpg.de>

Hi Simon,

sorry, I might haven't made that clear enough:
The problem only exists with changing a feature source (or type, but I 
didn't try that) because of the composite unique index in biosql 
seqfeature table, it doesn't check if the location is the same or not, 
but the combination of type, source, bioentry id and rank has to be 
unique. So if I insert a new feature, the rank gets increased by 
BioJava somehow and all is well, but if I update an existing features 
source and hit by accident the same combination as anothers fetures 
type, source, .. I get the exception and the source doesn't change.
At least that is what I suppose is happening.

My question was how to handle this situation?

Martina


Simon Foote wrote:

> Hi Martina,
> 
> Biojava should handle that correctly.  I haven't done it by changing a 
> feature source, but I have with changing a feature's location and 
> strand.  For changing a location:
> 
> // Get the Feature you wish to edit
> StrandedFeature sf = ex. use a feature filter to grab the feature by 
> it's ID
> Location loc = new Location(100, 1100);
> sf.setLocation(loc);
> 
> Since you have already retrieved the feature to edit, biojava will 
> automatically do this as an update and not an insert.  Or it should in 
> all cases where you are modifying a pre-existing feature.
> 

From reneehalbrook74 at yahoo.com  Wed Jun 22 11:24:13 2005
From: reneehalbrook74 at yahoo.com (Renee Halbrook)
Date: Wed Jun 22 11:15:38 2005
Subject: [BioSQL-l] very new to biosql--sequence loading question
Message-ID: <20050622152413.27457.qmail@web40506.mail.yahoo.com>

Hi,

I am very new to biosql. I have designed a mysql
schema to represent cyanobacteria, pulled from genbank
files. It is not identical to the biosql schema, but
it is similar.

My specific issue is in loading large sequences into a
sequence table, (essentially identical to the
biosequence table) using perl dbi. I keep running into
a 'max_allowed_packet' issue, even though I have
bumped it up to a 1 gig in the my.cnf file.

I would like to see how other people have implemented
this.

Could someone please point me in the direction of the
documentation for loading sequences using perl, from
flat genbank files, into a mysql database ?

Thanks in advance for any help.

Regards,
Renee


____________________________________________________ 
Yahoo! Sports 
Rekindle the Rivalries. Sign up for Fantasy Football 
http://football.fantasysports.yahoo.com
From simon.foote at nrc-cnrc.gc.ca  Tue Jun 21 08:47:08 2005
From: simon.foote at nrc-cnrc.gc.ca (Simon Foote)
Date: Wed Jun 22 13:06:35 2005
Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence
In-Reply-To: <42B6DEC3.9090807@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>
	<42B6DEC3.9090807@mpiib-berlin.mpg.de>
Message-ID: <42B80C4C.7060204@nrc-cnrc.gc.ca>

Hi Martina,

That would be correct as the on delete cascade doesn't touch the term 
tables as they are always referenced by any sequence.  There aren't any 
foreign key constraints put on those 4 tables, hence they don't get deleted.

Simon

Martina wrote:

> Hi,
>
> so I have this new database (still biosqldb-mysql.sqlv 1.40 2004/11/04 
> 01:49:41) and after removing all sequences, I do still have entries in 
> term, term_relationship,term_relationship_term and ontology. And of 
> course, in biodatabase. If I delete the entry in biodatabase too, 
> nothing changes. Is that what is to be expected?
> Cause I still have trouble with the dublicate entry key, but that must 
> be my code then.
>
> Thanks
> Martina
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l


-- 
Bioinformatics Programmer
Pathogen Genomics
Institute for Biological Sciences
National Research Council of Canada
[T] 613-990-0561  [F] 613-952-9092
simon.foote@nrc-cnrc.gc.ca

From simon.foote at nrc-cnrc.gc.ca  Tue Jun 21 12:15:45 2005
From: simon.foote at nrc-cnrc.gc.ca (Simon Foote)
Date: Wed Jun 22 13:06:42 2005
Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence
In-Reply-To: <42B81C43.9010404@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>	<42B6DEC3.9090807@mpiib-berlin.mpg.de>	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>	<42B7E788.3040205@mpiib-berlin.mpg.de>	<0be3992b92f6a14b6d06d5a06549555b@gnf.org>
	<42B81C43.9010404@mpiib-berlin.mpg.de>
Message-ID: <42B83D31.2000403@nrc-cnrc.gc.ca>

Hi Martina,

In fact you can, as rank is the field that allows this to happen.  In 
Biojava, currently it's just a linearily incremented number such that 
you can have the same type and source IDs for a given bioentry.

For example, adding a Genbank entry with 10 CDS features for 1 bioentry 
will give you identical keys for bioentry_id, type_term_id and 
source_term_id, but will have a rank of 1 - 10 for each.

Simon

Martina wrote:

> That means, that I can't have 2 features refering to the same bioentry 
> with the same type (= type_term_id)and source (=source_term_id) but 
> different parent features because of the composite key bioentry_id in 
> the seqfeature table? Or what does "rank" in that table mean (its part 
> of that key), how can I get different ranks?
>
> Martina
>
> Hilmar Lapp wrote:
>
>> The Biojava people will respond to this. Note though that 
>> Term_Relationship is for storing subject-predicate-object triples of 
>> terms, so I'm not sure why you want to use it for storing/associating 
>> annotation. Maybe you meant bioentry_qualifier_value?
>>
>>     -hilmar
>>
>> On Jun 21, 2005, at 3:10 AM, Martina wrote:
>>
>>>
>>>> Yes. When you insert a sequence you must be prepared that when 
>>>> inserting its ontology term or tag/value annotation the term may 
>>>> already be present because another bioentry uses it too.
>>>
>>>
>>>
>>> Ok, the proper way is to catch the SQLException in BIOSQLFeature, 
>>> test if it is a Dublicate key entry, get the identifier of the term 
>>> (would that be the BioSQLfeatureId ?) and insert it in the 
>>> term_relationship table? And there is no nice BioJava method for 
>>> this, I have to do it "manually", like conn.prepareStatement(..) and 
>>> stuff?  BioJava spoiled me so!
>>>
>>> Martina
>>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l


-- 
Bioinformatics Programmer
Pathogen Genomics
Institute for Biological Sciences
National Research Council of Canada
[T] 613-990-0561  [F] 613-952-9092
simon.foote@nrc-cnrc.gc.ca

From simon.foote at nrc-cnrc.gc.ca  Wed Jun 22 08:51:11 2005
From: simon.foote at nrc-cnrc.gc.ca (Simon Foote)
Date: Wed Jun 22 13:06:43 2005
Subject: [BioSQL-l] Re: update seqfeature
In-Reply-To: <42B92E38.2020008@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>	<42B6DEC3.9090807@mpiib-berlin.mpg.de>	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>	<42B7E788.3040205@mpiib-berlin.mpg.de>	<0be3992b92f6a14b6d06d5a06549555b@gnf.org>
	<42B81C43.9010404@mpiib-berlin.mpg.de>
	<42B83D31.2000403@nrc-cnrc.gc.ca>
	<42B92E38.2020008@mpiib-berlin.mpg.de>
Message-ID: <42B95EBF.7050403@nrc-cnrc.gc.ca>

Hi Martina,

Biojava should handle that correctly.  I haven't done it by changing a 
feature source, but I have with changing a feature's location and 
strand.  For changing a location:

// Get the Feature you wish to edit
StrandedFeature sf = ex. use a feature filter to grab the feature by it's ID
Location loc = new Location(100, 1100);
sf.setLocation(loc);

Since you have already retrieved the feature to edit, biojava will 
automatically do this as an update and not an insert.  Or it should in 
all cases where you are modifying a pre-existing feature.

Simon

Martina wrote:

> Hi Simon,
>
> I'm changing the FeatureSource and in setFeatureSource an update on 
> the source_term_id happens. In the case the combination is already 
> there, I get an Exception. The proper way to deal with that would be 
> to get the seqfeature_id of the entry already there and use that, or 
> try to update the rank unless its a unique combination? Or should I 
> rather not mess with the BioJava and delete that entry and insert it 
> as new to let BioJava handle the rank increase?
>
> Thanks for any advise
>
> Martina
>
> Simon Foote wrote:
>
>> Hi Martina,
>>
>> In fact you can, as rank is the field that allows this to happen.  In 
>> Biojava, currently it's just a linearily incremented number such that 
>> you can have the same type and source IDs for a given bioentry.
>>
>> For example, adding a Genbank entry with 10 CDS features for 1 
>> bioentry will give you identical keys for bioentry_id, type_term_id 
>> and source_term_id, but will have a rank of 1 - 10 for each.
>>
>> Simon
>>

-- 
Bioinformatics Programmer
Pathogen Genomics
Institute for Biological Sciences
National Research Council of Canada
[T] 613-990-0561  [F] 613-952-9092
simon.foote@nrc-cnrc.gc.ca

From simon.foote at nrc-cnrc.gc.ca  Wed Jun 22 09:15:54 2005
From: simon.foote at nrc-cnrc.gc.ca (Simon Foote)
Date: Wed Jun 22 13:06:45 2005
Subject: [BioSQL-l] Re: update seqfeature
In-Reply-To: <42B96228.4020100@mpiib-berlin.mpg.de>
References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com>	<42B6DEC3.9090807@mpiib-berlin.mpg.de>	<f5bb76b54331dc88107ebde4bee3dc46@gnf.org>	<42B7E788.3040205@mpiib-berlin.mpg.de>	<0be3992b92f6a14b6d06d5a06549555b@gnf.org>
	<42B81C43.9010404@mpiib-berlin.mpg.de>
	<42B83D31.2000403@nrc-cnrc.gc.ca>
	<42B92E38.2020008@mpiib-berlin.mpg.de>
	<42B95EBF.7050403@nrc-cnrc.gc.ca>
	<42B96228.4020100@mpiib-berlin.mpg.de>
Message-ID: <42B9648A.5040001@nrc-cnrc.gc.ca>

I get the problem now, that would then be a bug in biojava.  It should 
do an internal check to see if a source/type term change will cause a 
non-unique exception and if so, then also update the rank to the next 
available one.  One solution would be to catch the exception then do a 
select for the max(rank) for the given bioentry_id, source_term_id, 
type_term_id and then increment it by one.

In fact, it would probably be wise to always update the rank when 
changing either the source or type term, so that the ranks stay 
incrementally consistent, if that really matters.

Simon

Martina wrote:

> Hi Simon,
>
> sorry, I might haven't made that clear enough:
> The problem only exists with changing a feature source (or type, but I 
> didn't try that) because of the composite unique index in biosql 
> seqfeature table, it doesn't check if the location is the same or not, 
> but the combination of type, source, bioentry id and rank has to be 
> unique. So if I insert a new feature, the rank gets increased by 
> BioJava somehow and all is well, but if I update an existing features 
> source and hit by accident the same combination as anothers fetures 
> type, source, .. I get the exception and the source doesn't change.
> At least that is what I suppose is happening.
>
> My question was how to handle this situation?
>
> Martina
>
>
> Simon Foote wrote:
>
>> Hi Martina,
>>
>> Biojava should handle that correctly.  I haven't done it by changing 
>> a feature source, but I have with changing a feature's location and 
>> strand.  For changing a location:
>>
>> // Get the Feature you wish to edit
>> StrandedFeature sf = ex. use a feature filter to grab the feature by 
>> it's ID
>> Location loc = new Location(100, 1100);
>> sf.setLocation(loc);
>>
>> Since you have already retrieved the feature to edit, biojava will 
>> automatically do this as an update and not an insert.  Or it should 
>> in all cases where you are modifying a pre-existing feature.
>>

-- 
Bioinformatics Programmer
Pathogen Genomics
Institute for Biological Sciences
National Research Council of Canada
[T] 613-990-0561  [F] 613-952-9092
simon.foote@nrc-cnrc.gc.ca

From reneehalbrook74 at yahoo.com  Thu Jun 23 14:00:02 2005
From: reneehalbrook74 at yahoo.com (Renee Halbrook)
Date: Thu Jun 23 13:51:28 2005
Subject: [BioSQL-l] load_taxonomy.pl question
Message-ID: <20050623180004.70399.qmail@web40511.mail.yahoo.com>

Hi,
Is it possible alter the load_taxonomy.pl script to
load data for only a certain subtree? For example ,to
grab the taxonomy structure starting with
CyanoBacteria (id =1117)  as the root ? 

Thanks for any feedback,
Renee


____________________________________________________ 
Yahoo! Sports 
Rekindle the Rivalries. Sign up for Fantasy Football 
http://football.fantasysports.yahoo.com
From hlapp at gnf.org  Thu Jun 23 22:16:06 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu Jun 23 22:08:17 2005
Subject: [BioSQL-l] load_taxonomy.pl question
In-Reply-To: <20050623180004.70399.qmail@web40511.mail.yahoo.com>
References: <20050623180004.70399.qmail@web40511.mail.yahoo.com>
Message-ID: <d6ab87225d7004097a1af207fb630069@gnf.org>

I guess there could be a way, but it's got to be very complicated, 
because now you're trying to do something in perl for which perl's not 
made.

Why not just load up everything? It's not that much of diskspace. Also, 
if you're really eager to keep only the subtree, you could delete the 
rest using SQL.

	-hilmar

On Jun 23, 2005, at 2:00 PM, Renee Halbrook wrote:

> Hi,
> Is it possible alter the load_taxonomy.pl script to
> load data for only a certain subtree? For example ,to
> grab the taxonomy structure starting with
> CyanoBacteria (id =1117)  as the root ?
>
> Thanks for any feedback,
> Renee
>
>
> 		
> ____________________________________________________
> Yahoo! Sports
> Rekindle the Rivalries. Sign up for Fantasy Football
> http://football.fantasysports.yahoo.com
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From amackey at pcbi.upenn.edu  Fri Jun 24 09:09:17 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Fri Jun 24 09:00:59 2005
Subject: [BioSQL-l] load_taxonomy.pl question
In-Reply-To: <d6ab87225d7004097a1af207fb630069@gnf.org>
References: <20050623180004.70399.qmail@web40511.mail.yahoo.com>
	<d6ab87225d7004097a1af207fb630069@gnf.org>
Message-ID: <6E07CCDD-5C11-488A-A757-9E5D1210B70C@pcbi.upenn.edu>


I agree that it would be mildly complicated, but it's not Perl's  
fault at all.  It would be mildly complicated in any language.  The  
complication stems from the fact that the data we load from is a tab- 
delimited flat file of "taxon  parent-taxon" tuples, so as we load we  
cannot know (without some additional upfront work) whether any given  
row is desirable.  If we could know that, the solution would be  
trivial.  One way to know that is to basically read the whole file  
into a memory-representation of the tree (only keeping node id's for  
memory conservation), and then only keep the desired subtree (purge  
the rest); then, as we process the input files, only act on those  
lines that apply to members of the subtree (probably flattened to a  
hash to make lookup quicker).  No big deal really, and something Perl  
can do just as well as any other programming language.

I leave the implementation as an exercise for the reader, however, as  
I agree that deleting everything but the desired subtree via SQL  
would also work nicely, though not save any processing time ;)

-Aaron

On Jun 23, 2005, at 10:16 PM, Hilmar Lapp wrote:

> I guess there could be a way, but it's got to be very complicated,  
> because now you're trying to do something in perl for which perl's  
> not made.
>
> Why not just load up everything? It's not that much of diskspace.  
> Also, if you're really eager to keep only the subtree, you could  
> delete the rest using SQL.
>
>     -hilmar
>
> On Jun 23, 2005, at 2:00 PM, Renee Halbrook wrote:
>
>
>> Hi,
>> Is it possible alter the load_taxonomy.pl script to
>> load data for only a certain subtree? For example ,to
>> grab the taxonomy structure starting with
>> CyanoBacteria (id =1117)  as the root ?
>>
>> Thanks for any feedback,
>> Renee
>>
>>
>>
>> ____________________________________________________
>> Yahoo! Sports
>> Rekindle the Rivalries. Sign up for Fantasy Football
>> http://football.fantasysports.yahoo.com
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l@open-bio.org
>> http://open-bio.org/mailman/listinfo/biosql-l
>>
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>

--
Aaron J. Mackey, Ph.D.
Project Manager, ApiDB Bioinformatics Resource Center
Penn Genomics Institute, University of Pennsylvania
email:  amackey@pcbi.upenn.edu
office: 215-898-1205
fax:    215-746-6697
postal: Penn Genomics Institute
         Goddard Labs 212
         415 S. University Avenue
         Philadelphia, PA  19104-6017

From hollandr at gis.a-star.edu.sg  Sun Jun 26 11:06:40 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Sun Jun 26 10:59:26 2005
Subject: [BioSQL-l] update seqfeature 
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562B5@BIONIC.biopolis.one-north.com>

Actually, BioJava is not that clever. Yet. Martina's original observation is right, in that the correct way to do this would be to check the database to see if the altered seqfeature already existed, and if it did, to refer to that one instead. But this is not the way BioJava does things at present. A fix for this will probably end up being built in to the replacement BioJava/BioSQL classes currently in progress, but for now, to delete/create the feature is probably the best workaround.

cheers,
Richard


-----Original Message-----
From:	biosql-l-bounces@portal.open-bio.org on behalf of Martina
Sent:	Wed 6/22/2005 5:24 PM
To:	simon.foote@nrc-cnrc.gc.ca
Cc:	biosql-l-bounces@portal.open-bio.org; BioJava; biosql-l@open-bio.org
Subject:	[BioSQL-l] update seqfeature 

Hi Simon,

I'm changing the FeatureSource and in setFeatureSource an update on 
the source_term_id happens. In the case the combination is already 
there, I get an Exception. The proper way to deal with that would be 
to get the seqfeature_id of the entry already there and use that, or 
try to update the rank unless its a unique combination? Or should I 
rather not mess with the BioJava and delete that entry and insert it 
as new to let BioJava handle the rank increase?

Thanks for any advise

Martina

Simon Foote wrote:

> Hi Martina,
> 
> In fact you can, as rank is the field that allows this to happen.  In 
> Biojava, currently it's just a linearily incremented number such that 
> you can have the same type and source IDs for a given bioentry.
> 
> For example, adding a Genbank entry with 10 CDS features for 1 bioentry 
> will give you identical keys for bioentry_id, type_term_id and 
> source_term_id, but will have a rank of 1 - 10 for each.
> 
> Simon
> 

_______________________________________________
BioSQL-l mailing list
BioSQL-l@open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l


From hollandr at gis.a-star.edu.sg  Sun Jun 26 11:11:30 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Sun Jun 26 11:04:12 2005
Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562B6@BIONIC.biopolis.one-north.com>

The revamped BioJava/BioSQL classes will expose the rank to the user for all tables which have ranks.

cheers,
Richard


-----Original Message-----
From:	biosql-l-bounces@portal.open-bio.org on behalf of Simon Foote
Sent:	Wed 6/22/2005 12:15 AM
To:	Martina
Cc:	Hilmar Lapp; biosql-l-bounces@portal.open-bio.org; BioJava; biosql-l@open-bio.org
Subject:	Re: [Biojava-l] Re: [BioSQL-l] _removeSequence

Hi Martina,

In fact you can, as rank is the field that allows this to happen.  In 
Biojava, currently it's just a linearily incremented number such that 
you can have the same type and source IDs for a given bioentry.

For example, adding a Genbank entry with 10 CDS features for 1 bioentry 
will give you identical keys for bioentry_id, type_term_id and 
source_term_id, but will have a rank of 1 - 10 for each.

Simon

Martina wrote:

> That means, that I can't have 2 features refering to the same bioentry 
> with the same type (= type_term_id)and source (=source_term_id) but 
> different parent features because of the composite key bioentry_id in 
> the seqfeature table? Or what does "rank" in that table mean (its part 
> of that key), how can I get different ranks?
>
> Martina
>
> Hilmar Lapp wrote:
>
>> The Biojava people will respond to this. Note though that 
>> Term_Relationship is for storing subject-predicate-object triples of 
>> terms, so I'm not sure why you want to use it for storing/associating 
>> annotation. Maybe you meant bioentry_qualifier_value?
>>
>>     -hilmar
>>
>> On Jun 21, 2005, at 3:10 AM, Martina wrote:
>>
>>>
>>>> Yes. When you insert a sequence you must be prepared that when 
>>>> inserting its ontology term or tag/value annotation the term may 
>>>> already be present because another bioentry uses it too.
>>>
>>>
>>>
>>> Ok, the proper way is to catch the SQLException in BIOSQLFeature, 
>>> test if it is a Dublicate key entry, get the identifier of the term 
>>> (would that be the BioSQLfeatureId ?) and insert it in the 
>>> term_relationship table? And there is no nice BioJava method for 
>>> this, I have to do it "manually", like conn.prepareStatement(..) and 
>>> stuff?  BioJava spoiled me so!
>>>
>>> Martina
>>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l


-- 
Bioinformatics Programmer
Pathogen Genomics
Institute for Biological Sciences
National Research Council of Canada
[T] 613-990-0561  [F] 613-952-9092
simon.foote@nrc-cnrc.gc.ca

_______________________________________________
BioSQL-l mailing list
BioSQL-l@open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l


From hollandr at gis.a-star.edu.sg  Sun Jun 26 11:16:46 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Sun Jun 26 11:09:11 2005
Subject: [BioSQL-l] very new to biosql--sequence loading question
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562B8@BIONIC.biopolis.one-north.com>

Hi,

I can't answer the MySQL error question as I don't know anything about it, but I'm curious as to the differences between your db and BioSQL. What was it that BioSQL could not do that you had to reimplement in a different way? Maybe some suggestions could be made for improvements to BioSQL? Or maybe BioSQL can actually help with the problem but in a way that wasn't immediately obvious?

cheers,
Richard


-----Original Message-----
From:	biosql-l-bounces@portal.open-bio.org on behalf of Renee Halbrook
Sent:	Wed 6/22/2005 11:24 PM
To:	biosql-l@open-bio.org
Cc:	
Subject:	[BioSQL-l] very new to biosql--sequence loading question

Hi,

I am very new to biosql. I have designed a mysql
schema to represent cyanobacteria, pulled from genbank
files. It is not identical to the biosql schema, but
it is similar.

My specific issue is in loading large sequences into a
sequence table, (essentially identical to the
biosequence table) using perl dbi. I keep running into
a 'max_allowed_packet' issue, even though I have
bumped it up to a 1 gig in the my.cnf file.

I would like to see how other people have implemented
this.

Could someone please point me in the direction of the
documentation for loading sequences using perl, from
flat genbank files, into a mysql database ?

Thanks in advance for any help.

Regards,
Renee


____________________________________________________ 
Yahoo! Sports 
Rekindle the Rivalries. Sign up for Fantasy Football 
http://football.fantasysports.yahoo.com
_______________________________________________
BioSQL-l mailing list
BioSQL-l@open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l


From astew at wam.umd.edu  Mon Jun 27 16:03:38 2005
From: astew at wam.umd.edu (Andrew Stewart)
Date: Mon Jun 27 16:10:23 2005
Subject: [BioSQL-l] Strain support?
Message-ID: <42C05B9A.1000807@wam.umd.edu>

I don't see any strain support in the taxonomy for BioSQL.

Am I mistaken, or has anyone developed support for this, or is there a 
plan to in the future?


-Andrew Stewart
BDRD
From hlapp at gnf.org  Tue Jun 28 10:53:32 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jun 28 10:44:51 2005
Subject: [BioSQL-l] Strain support?
In-Reply-To: <42C05B9A.1000807@wam.umd.edu>
References: <42C05B9A.1000807@wam.umd.edu>
Message-ID: <39b80934b7196c6dc16971036c5b1fd9@gnf.org>

BioSQL supports as much strain information as the NCBI taxonomy 
database download supports as the two tables mirror the NCBI taxonomy 
tables, and the recommendation is to populate them in advance with the 
NCBI taxonomy downloaded tables so that species get properly resolved 
by NCBI taxon ID (which will distinguish strains).

What in particular did you find unsupported?

	-hilmar

On Jun 27, 2005, at 4:03 PM, Andrew Stewart wrote:

> I don't see any strain support in the taxonomy for BioSQL.
>
> Am I mistaken, or has anyone developed support for this, or is there a 
> plan to in the future?
>
>
> -Andrew Stewart
> BDRD
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From Teemu.Kivioja at vtt.fi  Tue Jun 28 11:19:44 2005
From: Teemu.Kivioja at vtt.fi (Teemu Kivioja)
Date: Tue Jun 28 11:10:57 2005
Subject: [BioSQL-l] Loading long strings to Oracle 
Message-ID: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi>

Hi,

I have couple of possibly related problems with loading to the Oracle database.

1.) When trying to load yeast proteins from SGD, I get:

perl load_seqdatabase.pl --host sboracle1.ad.vtt.fi  --driver Oracle 
--testonly --dbname BfxDB --format swiss --printerror test.swiss
Loading test.swiss ...
DBD::Oracle::st execute failed: ORA-01461: can bind a LONG value only for 
insert into a LONG column (DBD ERROR: OCIStmtExecute) [for statement 
``UPDATE biosequence SET version = NVL(?,version), length = NVL(?,length), 
alphabet = NVL(?,alphabet), crc = NVL(?,crc), seq = NVL(?,seq), ent_oid = 
NVL(?,ent_oid) WHERE ent_oid = ?'' with params: 
:p5='MAKQRQTTKSSKRYRYSSFKARIDDLKIEPARNLEKRVHDYVESSHFLASFDQWKEINLSAKFTEFAAEIEHDVQTLPQILYHDKKIFNSLVSFINFHDEFSLQPLLDLLAQFCHDLGPDFLKFYEEAIKTLINLLDAAIEFESSNVFEWGFNCLAYIFKYLSKFLVKKLVLTCDLLIPLLSHSKEYLSRFSAEALSFLVRKCPVSNLREFVRSVFEKLEGDDEQTNLYEGLLILFTESMTSTQETLHSKAKAIMSVLLHEALTKSSPERSVSLLSDIWMNISKYASIESLLPVYEVMYQDFNDSLDATNIDRILKVLTTIVFSESGRKIPDWNKITILIERIMSQSENCASLSQDKVAFLFALFIRNSDVKTLTLFHQKLFNYALTNISDCFLE...', 
:p3='protein', :p6='14404', :p1=undef, :p7='14404', :p4='F6ED4E3E9AE0F468', 
:p2=2493]) at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BaseDriver.pm 
line 1115, <GEN0> line 51.

The file test.swiss only includes the record:
ID   YBL004W        STANDARD;      PRT;   2494 AA.

It seems that I can get rid of this error message by explicitly telling 
that the type of sequence field is CLOB by adding the code
	    if ($slots[$i] eq "seq") {
	      $self->bind_param($sth, $j, $slotvals->[$i],
				{ ora_type => ORA_CLOB });
	    } else {
	      $self->bind_param($sth, $j, $slotvals->[$i]);
	    }


2.) When trying insert InterPro (interpro 10.0
ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get:

perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi 
--namespace interpro  --driver Oracle --dbname BfxDB --testonly --fmtargs 
"ontology_engine,simple" interpro.xml
...
11900
Loading ontology InterPro:
         ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were 
("IPR000911","Ribosomal protein L11","Ribosomes are ...

ORA-01461: can bind a LONG value only for insert into a LONG column (DBD 
ERROR: error possibly near <*> indicator at char 14 in 'INSERT INTO te<*>rm 
(identifier, name, definition, is_obsolete, ont_oid) VALUES (:p1, :p2, :p3, 
:p4, :p5)')
---------------------------------------------------
Could not store term IPR000911, name 'Ribosomal protein L11':

------------- EXCEPTION  -------------
MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or to be 
found by unique key

Again, the annotation is >2000 characters long but well under the 4000 
character limit.

3.) As others have already reported, the memory usage can be high,  the 
above load_ontology process takes about 2.5GB of memory.

I guess the fact that the problems 1 and 2 already arise with strings that 
are <4000 chars long might be related to the local character coding. The code:

my $hash_ref = $dbh->ora_nls_parameters();
my $database_charset = $hash_ref->{NLS_CHARACTERSET};	
my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET};
print "database charset: $database_charset\n";
print "national charset: $national_charset\n";

gives

database charset: WE8ISO8859P1
national charset: AL16UTF16

and

$  locale LC_CTYPE | head
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
6
UTF-8
70
84
1
0
1

Some details of the system:
Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit)
Oracle 10g, version 10.1.0.3.0 - 64bit
Perl, v5.8.0 built for x86_64-linux-thread-multi
Bioperl 1.4
bioperl-db 0.1
DBD::Oracle 1.16
Biosql-schema downloaded on May 10

What would be the best way to solve these problems?

Best regards,
Teemu Kivioja


------------------------------------------------------------------
Teemu Kivioja, Research Scientist
VTT Biotechnology
P.O. Box 1500, FIN-02044 VTT, Finland
(Street address: Tietotie 2, Espoo, Otaniemi)
Email: Teemu.Kivioja@vtt.fi
Phone: +358 20 722 7111
Fax: +358 20 722 7071

From hollandr at gis.a-star.edu.sg  Tue Jun 28 11:33:27 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Tue Jun 28 11:25:48 2005
Subject: [BioSQL-l] Loading long strings to Oracle 
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601E87172@BIONIC.biopolis.one-north.com>

We had similar trouble in BioJava when Oracle 9 and 10 suddenly stopped supporting the use of setString and getString on CLOB columns. Special code was required to force BioJava to detect the database and use the special Oracle CLOB-specific accession methods, just like your 'quick fix' of setting ora_type does below.

Hilmar is the best guy to talk to here as he uses BioPerl with BioSQL and Oracle in his production db at work.

Your annotation fails because you are using UTF16 as the character set in the database. This means that each character is stored as 16 bits or 2 bytes. As the limit in Oracle is 4000 bytes (note bytes not characters) this means that you can only store strings up to 2000 chars long with this encoding.

cheers,
Richard


-----Original Message-----
From:	biosql-l-bounces@portal.open-bio.org on behalf of Teemu Kivioja
Sent:	Tue 6/28/2005 11:19 PM
To:	biosql-l@open-bio.org
Cc:	
Subject:	[BioSQL-l] Loading long strings to Oracle 

Hi,

I have couple of possibly related problems with loading to the Oracle database.

1.) When trying to load yeast proteins from SGD, I get:

perl load_seqdatabase.pl --host sboracle1.ad.vtt.fi  --driver Oracle 
--testonly --dbname BfxDB --format swiss --printerror test.swiss
Loading test.swiss ...
DBD::Oracle::st execute failed: ORA-01461: can bind a LONG value only for 
insert into a LONG column (DBD ERROR: OCIStmtExecute) [for statement 
``UPDATE biosequence SET version = NVL(?,version), length = NVL(?,length), 
alphabet = NVL(?,alphabet), crc = NVL(?,crc), seq = NVL(?,seq), ent_oid = 
NVL(?,ent_oid) WHERE ent_oid = ?'' with params: 
:p5='MAKQRQTTKSSKRYRYSSFKARIDDLKIEPARNLEKRVHDYVESSHFLASFDQWKEINLSAKFTEFAAEIEHDVQTLPQILYHDKKIFNSLVSFINFHDEFSLQPLLDLLAQFCHDLGPDFLKFYEEAIKTLINLLDAAIEFESSNVFEWGFNCLAYIFKYLSKFLVKKLVLTCDLLIPLLSHSKEYLSRFSAEALSFLVRKCPVSNLREFVRSVFEKLEGDDEQTNLYEGLLILFTESMTSTQETLHSKAKAIMSVLLHEALTKSSPERSVSLLSDIWMNISKYASIESLLPVYEVMYQDFNDSLDATNIDRILKVLTTIVFSESGRKIPDWNKITILIERIMSQSENCASLSQDKVAFLFALFIRNSDVKTLTLFHQKLFNYALTNISDCFLE...', 
:p3='protein', :p6='14404', :p1=undef, :p7='14404', :p4='F6ED4E3E9AE0F468', 
:p2=2493]) at /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BaseDriver.pm 
line 1115, <GEN0> line 51.

The file test.swiss only includes the record:
ID   YBL004W        STANDARD;      PRT;   2494 AA.

It seems that I can get rid of this error message by explicitly telling 
that the type of sequence field is CLOB by adding the code
	    if ($slots[$i] eq "seq") {
	      $self->bind_param($sth, $j, $slotvals->[$i],
				{ ora_type => ORA_CLOB });
	    } else {
	      $self->bind_param($sth, $j, $slotvals->[$i]);
	    }


2.) When trying insert InterPro (interpro 10.0
ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get:

perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi 
--namespace interpro  --driver Oracle --dbname BfxDB --testonly --fmtargs 
"ontology_engine,simple" interpro.xml
...
11900
Loading ontology InterPro:
         ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were 
("IPR000911","Ribosomal protein L11","Ribosomes are ...

ORA-01461: can bind a LONG value only for insert into a LONG column (DBD 
ERROR: error possibly near <*> indicator at char 14 in 'INSERT INTO te<*>rm 
(identifier, name, definition, is_obsolete, ont_oid) VALUES (:p1, :p2, :p3, 
:p4, :p5)')
---------------------------------------------------
Could not store term IPR000911, name 'Ribosomal protein L11':

------------- EXCEPTION  -------------
MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or to be 
found by unique key

Again, the annotation is >2000 characters long but well under the 4000 
character limit.

3.) As others have already reported, the memory usage can be high,  the 
above load_ontology process takes about 2.5GB of memory.

I guess the fact that the problems 1 and 2 already arise with strings that 
are <4000 chars long might be related to the local character coding. The code:

my $hash_ref = $dbh->ora_nls_parameters();
my $database_charset = $hash_ref->{NLS_CHARACTERSET};	
my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET};
print "database charset: $database_charset\n";
print "national charset: $national_charset\n";

gives

database charset: WE8ISO8859P1
national charset: AL16UTF16

and

$  locale LC_CTYPE | head
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
6
UTF-8
70
84
1
0
1

Some details of the system:
Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit)
Oracle 10g, version 10.1.0.3.0 - 64bit
Perl, v5.8.0 built for x86_64-linux-thread-multi
Bioperl 1.4
bioperl-db 0.1
DBD::Oracle 1.16
Biosql-schema downloaded on May 10

What would be the best way to solve these problems?

Best regards,
Teemu Kivioja


------------------------------------------------------------------
Teemu Kivioja, Research Scientist
VTT Biotechnology
P.O. Box 1500, FIN-02044 VTT, Finland
(Street address: Tietotie 2, Espoo, Otaniemi)
Email: Teemu.Kivioja@vtt.fi
Phone: +358 20 722 7111
Fax: +358 20 722 7071

_______________________________________________
BioSQL-l mailing list
BioSQL-l@open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l


From hlapp at gnf.org  Tue Jun 28 11:42:22 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jun 28 11:33:49 2005
Subject: [BioSQL-l] Loading long strings to Oracle 
In-Reply-To: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi>
References: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi>
Message-ID: <31e6ba5c4a09865c67ae7a0d65f68252@gnf.org>

You're probably not using bioperl-db 0.1, judging from the generated  
query. Make sure you use a recent download from CVS.

There is a test in bioperl-db for inserting and retrieving long  
sequences. Have you run the tests and seen a problem?

This may indeed be due to some problem with the character encoding. The  
Oracle-specific layer of the adaptors deal a bit differently with  
sequences longer than 4000 chars. However, if in your case they are  
encoded in Unicode, then maybe the threshold would be half that size?  
Can you check what happens when you truncate the sequence and the other  
troubling string to less than 2000 chars?

Also, to nail down the problem, you could also try to have database and  
OS run under the same locale/encoding.

	-hilmar

On Jun 28, 2005, at 11:19 AM, Teemu Kivioja wrote:

> Hi,
>
> I have couple of possibly related problems with loading to the Oracle  
> database.
>
> 1.) When trying to load yeast proteins from SGD, I get:
>
> perl load_seqdatabase.pl --host sboracle1.ad.vtt.fi  --driver Oracle  
> --testonly --dbname BfxDB --format swiss --printerror test.swiss
> Loading test.swiss ...
> DBD::Oracle::st execute failed: ORA-01461: can bind a LONG value only  
> for insert into a LONG column (DBD ERROR: OCIStmtExecute) [for  
> statement ``UPDATE biosequence SET version = NVL(?,version), length =  
> NVL(?,length), alphabet = NVL(?,alphabet), crc = NVL(?,crc), seq =  
> NVL(?,seq), ent_oid = NVL(?,ent_oid) WHERE ent_oid = ?'' with params:  
> : 
> p5='MAKQRQTTKSSKRYRYSSFKARIDDLKIEPARNLEKRVHDYVESSHFLASFDQWKEINLSAKFTEFA 
> AEIEHDVQTLPQILYHDKKIFNSLVSFINFHDEFSLQPLLDLLAQFCHDLGPDFLKFYEEAIKTLINLLDA 
> AIEFESSNVFEWGFNCLAYIFKYLSKFLVKKLVLTCDLLIPLLSHSKEYLSRFSAEALSFLVRKCPVSNLR 
> EFVRSVFEKLEGDDEQTNLYEGLLILFTESMTSTQETLHSKAKAIMSVLLHEALTKSSPERSVSLLSDIWM 
> NISKYASIESLLPVYEVMYQDFNDSLDATNIDRILKVLTTIVFSESGRKIPDWNKITILIERIMSQSENCA 
> SLSQDKVAFLFALFIRNSDVKTLTLFHQKLFNYALTNISDCFLE...', :p3='protein',  
> :p6='14404', :p1=undef, :p7='14404', :p4='F6ED4E3E9AE0F468',  
> :p2=2493]) at  
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BaseDriver.pm line 1115,  
> <GEN0> line 51.
>
> The file test.swiss only includes the record:
> ID   YBL004W        STANDARD;      PRT;   2494 AA.
>
> It seems that I can get rid of this error message by explicitly  
> telling that the type of sequence field is CLOB by adding the code
> 	    if ($slots[$i] eq "seq") {
> 	      $self->bind_param($sth, $j, $slotvals->[$i],
> 				{ ora_type => ORA_CLOB });
> 	    } else {
> 	      $self->bind_param($sth, $j, $slotvals->[$i]);
> 	    }
>
>
> 2.) When trying insert InterPro (interpro 10.0
> ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get:
>
> perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi  
> --namespace interpro  --driver Oracle --dbname BfxDB --testonly  
> --fmtargs "ontology_engine,simple" interpro.xml
> ...
> 11900
> Loading ontology InterPro:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values  
> were ("IPR000911","Ribosomal protein L11","Ribosomes are ...
>
> ORA-01461: can bind a LONG value only for insert into a LONG column  
> (DBD ERROR: error possibly near <*> indicator at char 14 in 'INSERT  
> INTO te<*>rm (identifier, name, definition, is_obsolete, ont_oid)  
> VALUES (:p1, :p2, :p3, :p4, :p5)')
> ---------------------------------------------------
> Could not store term IPR000911, name 'Ribosomal protein L11':
>
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or  
> to be found by unique key
>
> Again, the annotation is >2000 characters long but well under the 4000  
> character limit.
>
> 3.) As others have already reported, the memory usage can be high,   
> the above load_ontology process takes about 2.5GB of memory.
>
> I guess the fact that the problems 1 and 2 already arise with strings  
> that are <4000 chars long might be related to the local character  
> coding. The code:
>
> my $hash_ref = $dbh->ora_nls_parameters();
> my $database_charset = $hash_ref->{NLS_CHARACTERSET};	
> my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET};
> print "database charset: $database_charset\n";
> print "national charset: $national_charset\n";
>
> gives
>
> database charset: WE8ISO8859P1
> national charset: AL16UTF16
>
> and
>
> $  locale LC_CTYPE | head
> upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct; 
> alnum;combining;combining_level3
> toupper;tolower;totitle
> 16
> 6
> UTF-8
> 70
> 84
> 1
> 0
> 1
>
> Some details of the system:
> Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit)
> Oracle 10g, version 10.1.0.3.0 - 64bit
> Perl, v5.8.0 built for x86_64-linux-thread-multi
> Bioperl 1.4
> bioperl-db 0.1
> DBD::Oracle 1.16
> Biosql-schema downloaded on May 10
>
> What would be the best way to solve these problems?
>
> Best regards,
> Teemu Kivioja
>
>
>
>
> ------------------------------------------------------------------
> Teemu Kivioja, Research Scientist
> VTT Biotechnology
> P.O. Box 1500, FIN-02044 VTT, Finland
> (Street address: Tietotie 2, Espoo, Otaniemi)
> Email: Teemu.Kivioja@vtt.fi
> Phone: +358 20 722 7111
> Fax: +358 20 722 7071
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gnf.org  Tue Jun 28 12:15:41 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jun 28 12:07:34 2005
Subject: [BioSQL-l] Loading long strings to Oracle 
In-Reply-To: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi>
References: <4.3.2.7.2.20050628153936.00c60d08@vttmail.vtt.fi>
Message-ID: <47b9e0958024fd2f4ccaeecb159a4b70@gnf.org>


On Jun 28, 2005, at 11:19 AM, Teemu Kivioja wrote:

> Hi,
>
>
> It seems that I can get rid of this error message by explicitly  
> telling that the type of sequence field is CLOB by adding the code
> 	    if ($slots[$i] eq "seq") {
> 	      $self->bind_param($sth, $j, $slotvals->[$i],
> 				{ ora_type => ORA_CLOB });
> 	    } else {
> 	      $self->bind_param($sth, $j, $slotvals->[$i]);
> 	    }

I've used that type of code before, but it wasn't necessary any more  
for INSERTs and didn't solve the problem for UPDATEs.

This may be related to the character encoding problem, in particular  
the fact that with UTF16 the byte length is no longer equal to the  
string length.


> 2.) When trying insert InterPro (interpro 10.0
> ftp://ftp.ebi.ac.uk/pub/databases/interpro/interpro.xml) I get:
>
> perl load_ontology.pl --format 'interpro' --host sboracle1.ad.vtt.fi  
> --namespace interpro  --driver Oracle --dbname BfxDB --testonly  
> --fmtargs "ontology_engine,simple" interpro.xml

Don't worry about the ontology engine. Use format interprosax, which is  
an alias to an event-based parser, that should keep the memory usage  
down.


> ...
> 11900
> Loading ontology InterPro:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values  
> were ("IPR000911","Ribosomal protein L11","Ribosomes are ...
>
> ORA-01461: can bind a LONG value only for insert into a LONG column  
> (DBD ERROR: error possibly near <*> indicator at char 14 in 'INSERT  
> INTO te<*>rm (identifier, name, definition, is_obsolete, ont_oid)  
> VALUES (:p1, :p2, :p3, :p4, :p5)')
> ---------------------------------------------------
> Could not store term IPR000911, name 'Ribosomal protein L11':
>
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Ontology::InterProTerm) failed to insert or  
> to be found by unique key
>
> Again, the annotation is >2000 characters long but well under the 4000  
> character limit.

This may be due to the encoding problem as well as you suspect yourself.

	-hilmar


>
> 3.) As others have already reported, the memory usage can be high,   
> the above load_ontology process takes about 2.5GB of memory.
>
> I guess the fact that the problems 1 and 2 already arise with strings  
> that are <4000 chars long might be related to the local character  
> coding. The code:
>
> my $hash_ref = $dbh->ora_nls_parameters();
> my $database_charset = $hash_ref->{NLS_CHARACTERSET};	
> my $national_charset = $hash_ref->{NLS_NCHAR_CHARACTERSET};
> print "database charset: $database_charset\n";
> print "national charset: $national_charset\n";
>
> gives
>
> database charset: WE8ISO8859P1
> national charset: AL16UTF16
>
> and
>
> $  locale LC_CTYPE | head
> upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct; 
> alnum;combining;combining_level3
> toupper;tolower;totitle
> 16
> 6
> UTF-8
> 70
> 84
> 1
> 0
> 1
>
> Some details of the system:
> Enterprise Linux, 2.4.21-32.0.1.ELsmp (64-bit)
> Oracle 10g, version 10.1.0.3.0 - 64bit
> Perl, v5.8.0 built for x86_64-linux-thread-multi
> Bioperl 1.4
> bioperl-db 0.1
> DBD::Oracle 1.16
> Biosql-schema downloaded on May 10
>
> What would be the best way to solve these problems?
>
> Best regards,
> Teemu Kivioja
>
>
>
>
> ------------------------------------------------------------------
> Teemu Kivioja, Research Scientist
> VTT Biotechnology
> P.O. Box 1500, FIN-02044 VTT, Finland
> (Street address: Tietotie 2, Espoo, Otaniemi)
> Email: Teemu.Kivioja@vtt.fi
> Phone: +358 20 722 7111
> Fax: +358 20 722 7071
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l@open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------