From daniel.lang at biologie.uni-freiburg.de Wed Jun 2 04:44:51 2004 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed Jun 2 04:48:24 2004 Subject: [BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq Message-ID: <40BD9383.3090603@biologie.uni-freiburg.de> Hi, I'm retrieving sequences out of a biosql db using Bio::DB::Query's... When calling the next_object function on the QueryResult, you get the persistence object for seq. I want to copy the object into a fresh seq object, to add new data and store it afterwarts as a new entry with a different namespace. The solution I?m using now is quite awkward...I copy it using SeqIO:( Is there a method to retrieve seq objects directly? Additionally, I'm quite confused by the mapping of bioperl objects to biosql tables(e.g. for generating a Bio:Query with datacollections): connections like bioenty<->seqI are obvious, but the rest? Is there a something like a overview list of the object mapping? A example script for Query and Constraints would be great:) Thanks in advance Daniel -- Daniel Lang University of Freiburg, Plant Biotechnology Sonnenstr. 5, D-79104 Freiburg phone: +49 761 203 6988 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang@biologie.uni-freiburg.de ################################################# >REALITY.SYS corrupted: Reboot universe? (Y/N/A) ################################################# Join MOSS 2004 in Freiburg, Germany from September 12th - 15th: registration and information @ http://www.plant-biotech.net/moss2004 From Marc.Logghe at devgen.com Wed Jun 2 05:31:23 2004 From: Marc.Logghe at devgen.com (Marc Logghe) Date: Wed Jun 2 05:35:12 2004 Subject: [BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq Message-ID: Hi Daniel, > Hi, > I'm retrieving sequences out of a biosql db using Bio::DB::Query's... > When calling the next_object function on the QueryResult, you get the > persistence object for seq. > I want to copy the object into a fresh seq object, to add new data and > store it afterwarts as a new entry with a different namespace. > The solution I?m using now is quite awkward...I copy it using SeqIO:( > Is there a method to retrieve seq objects directly? I don't know what will happen if you change the namespace of the persistent object and store it. Probably a lot of constraints ;-) (Not tested though !) A route you could follow is to 1. fetch the plain seq object 2. change the namespace and add some features 3. make it persistent and 4. store it. suppose you have your persistent seq in $pseq; my $seq = $pseq->obj; $seq->namespace('my_new_namespace') # do some other stuff my $new_pseq = $db->create_persistent($seq); $new_pseq->create; > > Additionally, I'm quite confused by the mapping of bioperl objects to > biosql tables(e.g. for generating a Bio:Query with datacollections): > connections like bioenty<->seqI are obvious, but the rest? > Is there a something like a overview list of the object mapping? Have a look at perldoc -m Bio::DB::BioSQL::BaseDriver, more precisely at the %object_entity_map variable. Examples of queries you might find in http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-db/t/query.t?rev=1.9&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup and also the presentation given at BOSC2003: http://open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf HTH, Marc From hlapp at gnf.org Wed Jun 2 14:02:09 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Wed Jun 2 14:05:55 2004 Subject: [BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq In-Reply-To: References: Message-ID: On Jun 2, 2004, at 2:31 AM, Marc Logghe wrote: > Hi Daniel, > >> I want to copy the object into a fresh seq object, to add new data and >> store it afterwarts as a new entry with a different namespace. > > I don't know what will happen if you change the namespace of the > persistent object and store it. Probably a lot of constraints ;-) (Not > tested though !) > A route you could follow is to > 1. fetch the plain seq object You don't really need to do this even. > 2. change the namespace and add some features > 3. make it persistent and > 4. store it. > Right, that would be the way. > suppose you have your persistent seq in $pseq; > my $seq = $pseq->obj; > $seq->namespace('my_new_namespace') > Again, no real reason to get the wrapped object unless you explicitly need a non-persistent object. Persistent objects in bioperl-db speak are not tightly coupled to the database; in fact you might say they are uncoupled. What I mean is that you may change any attribute or property of the persistent object without having any effect on what is stored in the database. Only once you ask the object to store itself will it sync the changes to the database. So, you may simply do the following: while (my $pseq = $query->next_object) { # e.g. change namespace $pseq->namespace("my namespace"); # change other things, e.g., tack on another feature # (which may or may not be a persistent object) $pseq->add_SeqFeature($myfeature); # ... # when done making changes, sync to database $pseq->store(); } Note that this will update bioentries to change their namespace, not duplicate them in another namespace. If you wanted to duplicate a sequence in another namespace, possibly with some changes on the annotation, replace $pseq->store() with the following: ... # trigger insert by making the object forget # its primary key $pseq->primary_key(undef); # we need to duplicate dependent objects # (children) too, like features foreach my $pfea ($pseq->get_SeqFeatures) { $pfea->primary_key(undef) if $pfea->isa("Bio::DB::PersistentObjectI"); # features have locations $pfea->location->primary_key(undef) if $pfea->location->isa("Bio::DB::PersistentObjectI"); } # do the insert $pseq->create(); You will note that this sample code actually does not cover all possible cases; e.g., if there are sub-features, or split locations. But you get the idea. Nevertheless, there is indeed a case for having a convenience method for de-persisting objects to better support those who want to duplicate them. > # do some other stuff > > my $new_pseq = $db->create_persistent($seq); > $new_pseq->create; > Note that this has problems associated as outlined above: - if you wanted to update the sequence, this would not do that - you will update the features though so that the original sequence won't have any features anymore (a feature has a foreign key to exactly one bioentry) >> >> Additionally, I'm quite confused by the mapping of bioperl objects to >> biosql tables(e.g. for generating a Bio:Query with datacollections): >> connections like bioenty<->seqI are obvious, but the rest? >> Is there a something like a overview list of the object mapping? > Have a look at perldoc -m Bio::DB::BioSQL::BaseDriver, more precisely > at the %object_entity_map variable. > Examples of queries you might find in > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-db/t/ > query.t?rev=1.9&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup > and also the presentation given at BOSC2003: > http://open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf > Right. Thanks for helping Marc. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From daniel.lang at biologie.uni-freiburg.de Thu Jun 3 04:01:00 2004 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Thu Jun 3 04:04:48 2004 Subject: [BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq In-Reply-To: References: Message-ID: <40BEDABC.90304@biologie.uni-freiburg.de> Thank you both for your extensive and quick answers answers... Hilmar Lapp wrote: > > On Jun 2, 2004, at 2:31 AM, Marc Logghe wrote: > >> Hi Daniel, >> >>> I want to copy the object into a fresh seq object, to add new data and >>> store it afterwarts as a new entry with a different namespace. >> >> >> I don't know what will happen if you change the namespace of the >> persistent object and store it. Probably a lot of constraints ;-) >> (Not tested though !) >> A route you could follow is to >> 1. fetch the plain seq object > > > You don't really need to do this even. > >> 2. change the namespace and add some features >> 3. make it persistent and >> 4. store it. >> > > Right, that would be the way. > >> suppose you have your persistent seq in $pseq; >> my $seq = $pseq->obj; >> $seq->namespace('my_new_namespace') >> > > Again, no real reason to get the wrapped object unless you explicitly > need a non-persistent object. > > Persistent objects in bioperl-db speak are not tightly coupled to the > database; in fact you might say they are uncoupled. What I mean is that > you may change any attribute or property of the persistent object > without having any effect on what is stored in the database. Only once > you ask the object to store itself will it sync the changes to the > database. > > So, you may simply do the following: > > while (my $pseq = $query->next_object) { > # e.g. change namespace > $pseq->namespace("my namespace"); > # change other things, e.g., tack on another feature > # (which may or may not be a persistent object) > $pseq->add_SeqFeature($myfeature); > # ... > # when done making changes, sync to database > $pseq->store(); > } > > Note that this will update bioentries to change their namespace, not > duplicate them in another namespace. If you wanted to duplicate a > sequence in another namespace, possibly with some changes on the > annotation, replace $pseq->store() with the following: > > ... > # trigger insert by making the object forget > # its primary key > $pseq->primary_key(undef); > # we need to duplicate dependent objects > # (children) too, like features > foreach my $pfea ($pseq->get_SeqFeatures) { > $pfea->primary_key(undef) > if $pfea->isa("Bio::DB::PersistentObjectI"); > # features have locations > $pfea->location->primary_key(undef) > if $pfea->location->isa("Bio::DB::PersistentObjectI"); > } > # do the insert > $pseq->create(); > > You will note that this sample code actually does not cover all > possible cases; e.g., if there are sub-features, or split locations. > But you get the idea. Nevertheless, there is indeed a case for having a > convenience method for de-persisting objects to better support those > who want to duplicate them. > >> # do some other stuff >> >> my $new_pseq = $db->create_persistent($seq); >> $new_pseq->create; >> > > Note that this has problems associated as outlined above: > > - if you wanted to update the sequence, this would not do that > - you will update the features though so that the original sequence > won't have any features anymore (a feature has a foreign key to exactly > one bioentry) > >>> >>> Additionally, I'm quite confused by the mapping of bioperl objects to >>> biosql tables(e.g. for generating a Bio:Query with datacollections): >>> connections like bioenty<->seqI are obvious, but the rest? >>> Is there a something like a overview list of the object mapping? >> >> Have a look at perldoc -m Bio::DB::BioSQL::BaseDriver, more >> precisely at the %object_entity_map variable. >> Examples of queries you might find in >> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-db/t/ >> query.t?rev=1.9&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup >> and also the presentation given at BOSC2003: >> http://open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf >> > > Right. Thanks for helping Marc. > > -hilmar > -- Daniel Lang University of Freiburg, Plant Biotechnology Sonnenstr. 5, D-79104 Freiburg phone: +49 761 203 6988 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang@biologie.uni-freiburg.de ################################################# >REALITY.SYS corrupted: Reboot universe? (Y/N/A) ################################################# Join MOSS 2004 in Freiburg, Germany from September 12th - 15th: registration and information @ http://www.plant-biotech.net/moss2004 From jochen at penguin-breeder.org Thu Jun 3 04:49:06 2004 From: jochen at penguin-breeder.org (jochen) Date: Thu Jun 3 04:52:15 2004 Subject: [BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq Message-ID: <20040603084906.GA27454@coffee.homeunix.org> Hi, I have a similar problem, namely I want to modify some sequences and store them back in the database, without overwriting any of the original sequences, basically this: # retrieve an existing sequence my $seq = Bio::Seq::RichSeq->new( -display_id => 'something' ); $seq = $seqadaptor->find_by_unique_key($seq); # make sure, $seq isn't persistant anymore my $buffer = new IO::String; my $out = new Bio::SeqIO(-fh => $buffer, -format => 'embl'); $out->write_seq($seq); $buffer->setpos(0); my $in = new Bio::SeqIO(-fh => $buffer, -format => 'embl'); $seq = $in->next_seq; # modify it a little $seq->primary_id('NEW001'); # create a new copy (fails, just overwrites the old one) $seq->create() A little debugging revealed that there are several unique constraints on the bioentry (using postgresql here), which prevent me from creating two objects, if they have o the same primary_id and/or o the same (accession_number,version,namespace) Isn't this an unneccsary restriction? especially, why is primary_id an unique constraint, and not (primary_id,namespace)? Even worse, $seq->create in most cases doesn't give an error if there is already a similar sequence, but just writes over the existing sequence: In Bio/DB/BioSQL/BasePersistenceAdaptor.pm, line 196-213, you try to insert an the new object. If this fails, you conclude this object already exists and retrieve it from the DB. Now this behaviour is ok for creating the eventually missing foreign key objects. However, if I invoke create() on an sequence object, I'd expect this object to be newly created or to receive an error. What do you think about this? Did I miss something there? I'd suggest fixing that by introducing two different create functions (or a parameter) that controls whether it's ok to retrieve an eventually existing object (i.e. when creating the foreign key objects) or whether the whole method should fail if there is an already existing object. > ... > # trigger insert by making the object forget > # its primary key > $pseq->primary_key(undef); > # we need to duplicate dependent objects > # (children) too, like features > foreach my $pfea ($pseq->get_SeqFeatures) { > $pfea->primary_key(undef) > if $pfea->isa("Bio::DB::PersistentObjectI"); > # features have locations > $pfea->location->primary_key(undef) > if $pfea->location->isa("Bio::DB::PersistentObjectI"); > } > # do the insert > $pseq->create(); assuming you just changed the namespace, this code example won't work, because you didn't change the primary_id, thus violating the unique constraint kind regards -- jochen From hlapp at gnf.org Fri Jun 4 14:16:06 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jun 4 14:19:22 2004 Subject: [BioSQL-l] Re: [Bioperl-l] Biosql documentation request In-Reply-To: References: Message-ID: <3EE5990A-B653-11D8-AB9B-000A95AE92B0@gnf.org> It pretty much is the latest. Amazing, isn't it? The schema is *very* stable. There are a few additions in the Oracle version which aren't really officially blessed yet (meaning, they're not in the MySQL/Pg versions but will be soon), and none of which breaks backwards compatibility. I'll try and see whether I can get postgres_autodoc installed over the weekend. Or maybe somebody on the biosql list has this setup already? -hilmar On Jun 4, 2004, at 4:38 AM, Brian Osborne wrote: > Hilmar, > > Neither does the ERD show nullability. The ERD is good but some useful > information is missing, yes. > > The ERD is dated 6/4/2003, is this the latest version? Pardon my > ignorance. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp > Sent: Friday, June 04, 2004 2:20 AM > To: Brian Osborne > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Biosql documentation request > > Brian, if I understand the output correctly it only documents the > schema elements. Do you feel that the ERD (doc/biosql-ERD.pdf) does not > fulfill this purpose well enough? > > The ERD diagram actually doesn't show the unique key constraints, so > that would be a difference indeed. > > -hilmar > > On Thursday, June 3, 2004, at 05:56 AM, Brian Osborne wrote: > >> Bioperl-l, >> >> Dave Howorth has provided a detailed critique of the bioperl-db/biosql >> documentation which I'm working through. One thing that he noticed was >> that >> the Biosql file doc/biosql.html was out-of-date. This file was created >> by >> running a script called postgres_autodoc.pl on a Postgres instance of >> the >> biosql schema. Can anyone provide me with a current version of this >> file? I >> run biosql on Mysql myself and I haven't found a script or utility >> equivalent to postgres_autodoc.pl. postgres_autodoc.pl is available at >> http://www.rbt.ca/autodoc/. >> >> Brian O. >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Fri Jun 4 23:41:30 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Fri Jun 4 23:44:43 2004 Subject: [BioSQL-l] Re: [Bioperl-l] Biosql documentation request In-Reply-To: References: Message-ID: <3B611DD0-B6A2-11D8-AB9B-000A95AE92B0@gnf.org> The kudos w.r.t. to the INSTALL document should go to Ewan who I believe wrote (and tested) that heroically during the Singapore hackathon. Great though that it proved useful. -hilmar On Jun 4, 2004, at 6:15 PM, Brian Osborne wrote: > Hilmar, > > I went ahead and installed postgres as well as the biosql schema when I > found out that Cygwin would install postgres. Kudos to you: the > installation > of postgres, postgres initialization, and biosql database creation took > about 10 minutes and I've never used postgres before. I was basically > following the INSTALL instructions. The new biosql.html has been > commited. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp > Sent: Friday, June 04, 2004 2:16 PM > To: Brian Osborne > Cc: Hilmar Lapp; BioPerl; Biosql > Subject: Re: [Bioperl-l] Biosql documentation request > > It pretty much is the latest. Amazing, isn't it? The schema is *very* > stable. > > There are a few additions in the Oracle version which aren't really > officially blessed yet (meaning, they're not in the MySQL/Pg versions > but will be soon), and none of which breaks backwards compatibility. > > I'll try and see whether I can get postgres_autodoc installed over the > weekend. > > Or maybe somebody on the biosql list has this setup already? > > -hilmar > > On Jun 4, 2004, at 4:38 AM, Brian Osborne wrote: > >> Hilmar, >> >> Neither does the ERD show nullability. The ERD is good but some useful >> information is missing, yes. >> >> The ERD is dated 6/4/2003, is this the latest version? Pardon my >> ignorance. >> >> Brian O. >> >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org >> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp >> Sent: Friday, June 04, 2004 2:20 AM >> To: Brian Osborne >> Cc: bioperl-l@bioperl.org >> Subject: Re: [Bioperl-l] Biosql documentation request >> >> Brian, if I understand the output correctly it only documents the >> schema elements. Do you feel that the ERD (doc/biosql-ERD.pdf) does >> not >> fulfill this purpose well enough? >> >> The ERD diagram actually doesn't show the unique key constraints, so >> that would be a difference indeed. >> >> -hilmar >> >> On Thursday, June 3, 2004, at 05:56 AM, Brian Osborne wrote: >> >>> Bioperl-l, >>> >>> Dave Howorth has provided a detailed critique of the >>> bioperl-db/biosql >>> documentation which I'm working through. One thing that he noticed >>> was >>> that >>> the Biosql file doc/biosql.html was out-of-date. This file was >>> created >>> by >>> running a script called postgres_autodoc.pl on a Postgres instance of >>> the >>> biosql schema. Can anyone provide me with a current version of this >>> file? I >>> run biosql on Mysql myself and I haven't found a script or utility >>> equivalent to postgres_autodoc.pl. postgres_autodoc.pl is available >>> at >>> http://www.rbt.ca/autodoc/. >>> >>> Brian O. >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From brian_osborne at cognia.com Fri Jun 4 21:15:56 2004 From: brian_osborne at cognia.com (Brian Osborne) Date: Sat Jun 5 20:57:07 2004 Subject: [BioSQL-l] RE: [Bioperl-l] Biosql documentation request In-Reply-To: <3EE5990A-B653-11D8-AB9B-000A95AE92B0@gnf.org> Message-ID: Hilmar, I went ahead and installed postgres as well as the biosql schema when I found out that Cygwin would install postgres. Kudos to you: the installation of postgres, postgres initialization, and biosql database creation took about 10 minutes and I've never used postgres before. I was basically following the INSTALL instructions. The new biosql.html has been commited. Brian O. -----Original Message----- From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp Sent: Friday, June 04, 2004 2:16 PM To: Brian Osborne Cc: Hilmar Lapp; BioPerl; Biosql Subject: Re: [Bioperl-l] Biosql documentation request It pretty much is the latest. Amazing, isn't it? The schema is *very* stable. There are a few additions in the Oracle version which aren't really officially blessed yet (meaning, they're not in the MySQL/Pg versions but will be soon), and none of which breaks backwards compatibility. I'll try and see whether I can get postgres_autodoc installed over the weekend. Or maybe somebody on the biosql list has this setup already? -hilmar On Jun 4, 2004, at 4:38 AM, Brian Osborne wrote: > Hilmar, > > Neither does the ERD show nullability. The ERD is good but some useful > information is missing, yes. > > The ERD is dated 6/4/2003, is this the latest version? Pardon my > ignorance. > > Brian O. > > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp > Sent: Friday, June 04, 2004 2:20 AM > To: Brian Osborne > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Biosql documentation request > > Brian, if I understand the output correctly it only documents the > schema elements. Do you feel that the ERD (doc/biosql-ERD.pdf) does not > fulfill this purpose well enough? > > The ERD diagram actually doesn't show the unique key constraints, so > that would be a difference indeed. > > -hilmar > > On Thursday, June 3, 2004, at 05:56 AM, Brian Osborne wrote: > >> Bioperl-l, >> >> Dave Howorth has provided a detailed critique of the bioperl-db/biosql >> documentation which I'm working through. One thing that he noticed was >> that >> the Biosql file doc/biosql.html was out-of-date. This file was created >> by >> running a script called postgres_autodoc.pl on a Postgres instance of >> the >> biosql schema. Can anyone provide me with a current version of this >> file? I >> run biosql on Mysql myself and I haven't found a script or utility >> equivalent to postgres_autodoc.pl. postgres_autodoc.pl is available at >> http://www.rbt.ca/autodoc/. >> >> Brian O. >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gnf.org Mon Jun 7 19:52:26 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Mon Jun 7 19:55:32 2004 Subject: [BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq In-Reply-To: <20040603084906.GA27454@coffee.homeunix.org> References: <20040603084906.GA27454@coffee.homeunix.org> Message-ID: On Jun 3, 2004, at 1:49 AM, jochen wrote: > Hi, > > I have a similar problem, namely I want to modify some sequences and > store them back in the database, without overwriting any of the > original > sequences, basically this: > > # retrieve an existing sequence > my $seq = Bio::Seq::RichSeq->new( -display_id => 'something' ); Note that display_id (bioentry.name) is not constrained by a unique index and therefore you may easily get duplicate records (which will cause an exception if searching by unique key). > $seq = $seqadaptor->find_by_unique_key($seq); > > # make sure, $seq isn't persistant anymore > my $buffer = new IO::String; > my $out = new Bio::SeqIO(-fh => $buffer, -format => 'embl'); > $out->write_seq($seq); > $buffer->setpos(0); > my $in = new Bio::SeqIO(-fh => $buffer, -format => 'embl'); > $seq = $in->next_seq; > > # modify it a little > $seq->primary_id('NEW001'); > > # create a new copy (fails, just overwrites the old one) > $seq->create() With the above code this line needs to throw a perl error for calling a non-existent function on an object. A sequence stream will never give you a persistent object. Should I assume that between the lines you created a persistent object from the object that the SeqIO stream returned to you? > A little debugging revealed that there are several unique constraints > on the bioentry (using postgresql here), which prevent me from > creating two objects, if they have > > o the same primary_id and/or > o the same (accession_number,version,namespace) > > Isn't this an unneccsary restriction? especially, why is primary_id an > unique constraint, and not (primary_id,namespace)? > This was suggested before, and in fact you can change that constraint to include the identifier. I thought it's in the schema as a commented out option, but apparently it is not (yet). Bioperl-db will use, but not mandate, the namespace as additional constraint when doing a lookup by primary_id. (accession_number,version,namespace) is a well-established uniqueness constraint on sequences in order to guarantee a minimal amount of sanity. > Even worse, $seq->create in most cases doesn't give an error if there > is already a similar sequence, but just writes over the existing > sequence: It doesn't write over an existing sequence. It will update the attributes of the object you wanted to create to match those of the existing object in the database, unless you pass in an object factory (-obj_factory => $myseqfactory). > > In Bio/DB/BioSQL/BasePersistenceAdaptor.pm, line 196-213, you try to > insert an the new object. If this fails, you conclude this object > already exists and retrieve it from the DB. Now this behaviour is ok > for creating the eventually missing foreign key objects. However, if I > invoke create() on an sequence object, I'd expect this object to be > newly created or to receive an error. > If that's what you expect then run a find_by_unique_key() first to make sure it's not present already. (Note that this is still no guarantee because between the time you get the negative result and the time you commit the create() transaction somebody else may have inserted the same sequence.) Note that the method is named create(), not insert_or_fail(). The purpose is that after the call returns successfully the object on which you invoked create() has an equivalent entry in the database. It is not an error if the respective row that you wanted to be present in the database is already there. If it were, you'd mandate the user to run in almost all cases the logic you found at this place if an exception occurs. I.e., you'd require the user to worry about a lot of absence/presence/concurrency/transactional possibilities when all that he/she wanted was to make sure the sequence (as identified by its unique key) is in the database. Bioperl-db is not a SQL interface. It's an OR mapper. You use it if you want to live and navigate in object land, not when you want to be close to the RDBMS vibe. At least that's the goal ... > What do you think about this? Did I miss something there? > > I'd suggest fixing that by introducing two different create functions > (or a parameter) that controls whether it's ok to retrieve an > eventually existing object (i.e. when creating the foreign key > objects) or whether the whole method should fail if there is an > already existing object. It's easily achievable on the client end by running the find_by_unqiue_key() first. > >> ... >> # trigger insert by making the object forget >> # its primary key >> $pseq->primary_key(undef); >> # we need to duplicate dependent objects >> # (children) too, like features >> foreach my $pfea ($pseq->get_SeqFeatures) { >> $pfea->primary_key(undef) >> if $pfea->isa("Bio::DB::PersistentObjectI"); >> # features have locations >> $pfea->location->primary_key(undef) >> if $pfea->location->isa("Bio::DB::PersistentObjectI"); >> } >> # do the insert >> $pseq->create(); > > assuming you just changed the namespace, this code example won't work, > because you didn't change the primary_id, thus violating the unique > constraint Right. It wasn't meant as bullet-proof code. (Note that primary_id is optional.) I'm inclined to make the tuple of (identifier,namespace) the default for the future; there seem to be too many subtle issues otherwise if you're unsuspecting. -hilmar > > kind regards > -- jochen > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From jochen at penguin-breeder.org Tue Jun 8 04:42:58 2004 From: jochen at penguin-breeder.org (Jochen Eisinger) Date: Tue Jun 8 04:45:55 2004 Subject: [BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq In-Reply-To: References: <20040603084906.GA27454@coffee.homeunix.org> Message-ID: <20040608084258.GA10233@coffee.homeunix.org> Hi, thanks for your clarifying answer! On Mon, Jun 07, 2004 at 04:52:26PM -0700, Hilmar Lapp wrote: > >$seq = $seqadaptor->find_by_unique_key($seq); > > > ># make sure, $seq isn't persistant anymore > >my $buffer = new IO::String; > >my $out = new Bio::SeqIO(-fh => $buffer, -format => 'embl'); > >$out->write_seq($seq); > >$buffer->setpos(0); > >my $in = new Bio::SeqIO(-fh => $buffer, -format => 'embl'); > >$seq = $in->next_seq; > > > ># modify it a little > >$seq->primary_id('NEW001'); > > > ># create a new copy (fails, just overwrites the old one) > >$seq->create() > > With the above code this line needs to throw a perl error for calling a > non-existent function on an object. A sequence stream will never give > you a persistent object. Ah, yes, I forgot $seq = $db->create_persistent($seq) before the create() in the above example. > (accession_number,version,namespace) is a well-established uniqueness > constraint on sequences in order to guarantee a minimal amount of > sanity. Why isn't this the primary key btw? I'm quite new to biosql and may still be missing some points... I'm rather surprised you're using artificial columns as primary keys and add unique constraints to the table, instead of using them as primary keys and dropping this integer valued id columns. > > >Even worse, $seq->create in most cases doesn't give an error if there > >is already a similar sequence, but just writes over the existing > >sequence: > > It doesn't write over an existing sequence. It will update the > attributes of the object you wanted to create to match those of the > existing object in the database, unless you pass in an object factory > (-obj_factory => $myseqfactory). It won't update the record in any case. If you change the length of the sequence for example, you will get an error "tried to lie about sequence length" > >In Bio/DB/BioSQL/BasePersistenceAdaptor.pm, line 196-213, you try to > >insert an the new object. If this fails, you conclude this object > >already exists and retrieve it from the DB. Now this behaviour is ok > >for creating the eventually missing foreign key objects. However, if I > >invoke create() on an sequence object, I'd expect this object to be > >newly created or to receive an error. > > > > If that's what you expect then run a find_by_unique_key() first to make > sure it's not present already. (Note that this is still no guarantee > because between the time you get the negative result and the time you > commit the create() transaction somebody else may have inserted the > same sequence.) That should not be possible, the DBs transaction system should take care of this. > Note that the method is named create(), not insert_or_fail(). The > purpose is that after the call returns successfully the object on which > you invoked create() has an equivalent entry in the database. It is not > an error if the respective row that you wanted to be present in the > database is already there. I expected store() to do this, and create to be insert_or_fail-like > Bioperl-db is not a SQL interface. It's an OR mapper. You use it if you > want to live and navigate in object land, not when you want to be close > to the RDBMS vibe. At least that's the goal ... Ok > I'm inclined to make the tuple of (identifier,namespace) the default > for the future; there seem to be too many subtle issues otherwise if > you're unsuspecting. I guess that would be a good thing to do. Otherwise it's quite impossible to have the same sequence in multiple versions in a single database. In my case, I need to have sequences with several different annotations stored in one db. changing the primary id of the sequences is not an option here. kind regards -- jochen From jochen at penguin-breeder.org Tue Jun 8 10:25:35 2004 From: jochen at penguin-breeder.org (Jochen Eisinger) Date: Tue Jun 8 10:28:30 2004 Subject: [BioSQL-l] SimpleValueAdaptor does not accept values of 0 Message-ID: <20040608142535.GA23458@coffee.homeunix.org> Hi, I ran into the problem that 0 values won't be retrieved from the database. I found the same bug reported in the bugzilla db: http://bugzilla.bioperl.org/show_bug.cgi?id=1586 the solution suggested there works for me. kind regards -- jochen From hlapp at gnf.org Tue Jun 8 13:08:49 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 8 13:11:57 2004 Subject: [BioSQL-l] Bioperl-db: Added -flat_only option to find_by_query() Message-ID: <826BD7E0-B96E-11D8-9CC5-000A95AE92B0@gnf.org> Disregard if you aren't using bioperl-db. This option was previously only available with find_by_unique_key(). You can now pass it to find_by_query() as well. -flat_only means retrieved objects will not get their children retrieved and attached. E.g., when retrieving a Bio::SeqI object, there won't be features nor annotation with this flag set to true when you get the found object(s) returned. This is useful to save time if you aren't going to query those attributes anyway in your script. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gnf.org Tue Jun 8 13:09:07 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Tue Jun 8 13:12:11 2004 Subject: [BioSQL-l] SimpleValueAdaptor does not accept values of 0 In-Reply-To: <20040608142535.GA23458@coffee.homeunix.org> References: <20040608142535.GA23458@coffee.homeunix.org> Message-ID: <8D0BA10C-B96E-11D8-9CC5-000A95AE92B0@gnf.org> Fixed in the repository. -hilmar On Jun 8, 2004, at 7:25 AM, Jochen Eisinger wrote: > Hi, > > I ran into the problem that 0 values won't be retrieved from the > database. I found the same bug reported in the bugzilla db: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1586 > > the solution suggested there works for me. > > kind regards > -- jochen > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hlapp at gmx.net Sat Jun 12 20:11:11 2004 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat Jun 12 20:25:43 2004 Subject: [BioSQL-l] How to get a Seq object from Bio::DB::Persistent::Seq In-Reply-To: <20040608084258.GA10233@coffee.homeunix.org> Message-ID: <2D5D0E01-BCCE-11D8-8A0A-000A959EB4C4@gmx.net> On Tuesday, June 8, 2004, at 01:42 AM, Jochen Eisinger wrote: > >> (accession_number,version,namespace) is a well-established uniqueness >> constraint on sequences in order to guarantee a minimal amount of >> sanity. > > Why isn't this the primary key btw? I'm quite new to biosql and may > still be missing some points... I'm rather surprised you're using > artificial columns as primary keys and add unique constraints to the > table, instead of using them as primary keys and dropping this integer > valued id columns. Who uses the natural primary key as the physical primary key? It's common and best practice not to do so, because 1) a natural primary key will change if you change the attribute(s), which means you'll have to change the foreign keys referencing it too, and 2) especially multi-column keys are slow to join (but even a single-column character column is slower). There's plenty of relational database design and theory textbooks out there that explain this a lot better and in depth. >> >> It doesn't write over an existing sequence. It will update the >> attributes of the object you wanted to create to match those of the >> existing object in the database, unless you pass in an object factory >> (-obj_factory => $myseqfactory). > > It won't update the record in any case. If you change the length of the > sequence for example, you will get an error "tried to lie about > sequence > length" It will update the object I said, not the record in the database. You cannot set $seq->length to a value other than the actual length of the sequence if there is one. >> >> If that's what you expect then run a find_by_unique_key() first to >> make >> sure it's not present already. (Note that this is still no guarantee >> because between the time you get the negative result and the time you >> commit the create() transaction somebody else may have inserted the >> same sequence.) > That should not be possible, the DBs transaction system should take > care > of this. Tell me how it should be able to accomplish this. Transactions don't cure wrong assumptions, they just isolate concurrent access. Lets assume you have a record to be inserted with unique key 'foo'. At the time you make a lookup on that key somebody else inserted a record with the same key but hasn't committed the transaction yet. Your lookup will return no record. Now you go ahead and insert the record. If the other user's transaction isn't rolled back, your insert will either fail immediately if he committed meanwhile, or it will block and fail once he commits. > > In my case, I need to have sequences with several different annotations > stored in one db. changing the primary id of the sequences is not an > option here. > If you change the UK constraint to include the namespace you should be fine. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From gwu at molbio.mgh.harvard.edu Tue Jun 15 12:38:08 2004 From: gwu at molbio.mgh.harvard.edu (Gang Wu) Date: Tue Jun 15 12:41:13 2004 Subject: [BioSQL-l] how to quickly retrieve feature sequences Message-ID: Hi, I just loaded the 5 Arabidopsis thalian Genbank genome files into my sequence database(BioSQL 1.38). My question is: How can I efficiently retrieve all gene sequences from the database? I tried to do that by joining seqfeature, seqfeature_qualifier_value, location, term and biosequence tables, but it turned out to be extremely slow(See the attached SQL, 2 records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G Xeons). Does anyone have a better way to do it? All I can imagine to do this faster is(by Java or other languages): Pull all gene location info; Pull erlated sequence from biosequence table; rotate through the gene location list and retrieve the substring of the sequence. But this does not seem attractive for me since for different applications, I have to write code to pull the sequences by myself. Is it possible to extend/modify the BioSQL schema to serve this purpose better? My understanding is that a lot subsequent applications would be only interested in certain pieces of the whole genome sequences and there must be an efficient way to do that. If everyone has to invent their method, the BioSQL might be a little bit too limited. Any idea on this? Gang From gwu at molbio.mgh.harvard.edu Tue Jun 15 13:12:36 2004 From: gwu at molbio.mgh.harvard.edu (Gang Wu) Date: Tue Jun 15 13:15:23 2004 Subject: [BioSQL-l] how to quickly retrieve feature sequences In-Reply-To: Message-ID: Just forgot to attach the SQL. ========================================= ATTACHMENT 1 ========================================= CREATE TABLE `term_relationship_term` ( `term_relationship_id` int(11) NOT NULL default '0', `term_id` int(11) NOT NULL default '0', PRIMARY KEY (`term_relationship_id`,`term_id`), UNIQUE KEY `term_relationship_id` (`term_relationship_id`), UNIQUE KEY `term_id` (`term_id`) ) TYPE=InnoDB; ======================================== Gang -----Original Message----- From: biosql-l-bounces@portal.open-bio.org [mailto:biosql-l-bounces@portal.open-bio.org]On Behalf Of Gang Wu Sent: Tuesday, June 15, 2004 12:38 PM To: biosql-l@open-bio.org Subject: [BioSQL-l] how to quickly retrieve feature sequences Hi, I just loaded the 5 Arabidopsis thalian Genbank genome files into my sequence database(BioSQL 1.38). My question is: How can I efficiently retrieve all gene sequences from the database? I tried to do that by joining seqfeature, seqfeature_qualifier_value, location, term and biosequence tables, but it turned out to be extremely slow(See the attached SQL, 2 records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G Xeons). Does anyone have a better way to do it? All I can imagine to do this faster is(by Java or other languages): Pull all gene location info; Pull erlated sequence from biosequence table; rotate through the gene location list and retrieve the substring of the sequence. But this does not seem attractive for me since for different applications, I have to write code to pull the sequences by myself. Is it possible to extend/modify the BioSQL schema to serve this purpose better? My understanding is that a lot subsequent applications would be only interested in certain pieces of the whole genome sequences and there must be an efficient way to do that. If everyone has to invent their method, the BioSQL might be a little bit too limited. Any idea on this? Gang _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From gwu at molbio.mgh.harvard.edu Tue Jun 15 13:31:28 2004 From: gwu at molbio.mgh.harvard.edu (Gang Wu) Date: Tue Jun 15 13:34:11 2004 Subject: [BioSQL-l] how to quickly retrieve feature sequences In-Reply-To: Message-ID: SQL again: SELECT t1.seqfeature_id,t1.bioentry_id,t2.start_pos, t2.end_pos, t2.strand, t4.value locus_tag, substring(t6.seq, t2.start_pos,t2.end_pos) seq FROM `seqfeature` t1 inner join location t2 on t1.seqfeature_id=t2.seqfeature_id inner join term t3 on t1.type_term_id=t3.term_id inner join seqfeature_qualifier_value t4 on t1.seqfeature_id=t4.seqfeature_id inner join term t5 on t4.term_id=t5.term_id inner join biosequence t6 on t1.bioentry_id=t6.bioentry_id where t3.name='gene' and t5.name='locus_tag' limit 2 Gang -----Original Message----- From: biosql-l-bounces@portal.open-bio.org [mailto:biosql-l-bounces@portal.open-bio.org]On Behalf Of Gang Wu Sent: Tuesday, June 15, 2004 12:38 PM To: biosql-l@open-bio.org Subject: [BioSQL-l] how to quickly retrieve feature sequences Hi, I just loaded the 5 Arabidopsis thalian Genbank genome files into my sequence database(BioSQL 1.38). My question is: How can I efficiently retrieve all gene sequences from the database? I tried to do that by joining seqfeature, seqfeature_qualifier_value, location, term and biosequence tables, but it turned out to be extremely slow(See the attached SQL, 2 records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G Xeons). Does anyone have a better way to do it? All I can imagine to do this faster is(by Java or other languages): Pull all gene location info; Pull erlated sequence from biosequence table; rotate through the gene location list and retrieve the substring of the sequence. But this does not seem attractive for me since for different applications, I have to write code to pull the sequences by myself. Is it possible to extend/modify the BioSQL schema to serve this purpose better? My understanding is that a lot subsequent applications would be only interested in certain pieces of the whole genome sequences and there must be an efficient way to do that. If everyone has to invent their method, the BioSQL might be a little bit too limited. Any idea on this? Gang _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From hlapp at gnf.org Sun Jun 20 09:21:28 2004 From: hlapp at gnf.org (Hilmar Lapp) Date: Sun Jun 20 09:24:14 2004 Subject: [BioSQL-l] how to quickly retrieve feature sequences In-Reply-To: Message-ID: Gang, do you want to do this in high-throughput? Otherwise you could use bioperl and bioperl-db as the language-binding and then use the bioperl object model to retrieve the information. I'm away from my desk for a week, so I won't be able to elaborate further before the week after next week. -hilmar On Tuesday, June 15, 2004, at 09:38 AM, Gang Wu wrote: > Hi, > > I just loaded the 5 Arabidopsis thalian Genbank genome files into my > sequence database(BioSQL 1.38). My question is: How can I efficiently > retrieve all gene sequences from the database? I tried to do that by > joining > seqfeature, seqfeature_qualifier_value, location, term and biosequence > tables, but it turned out to be extremely slow(See the attached SQL, 2 > records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G > Xeons). Does anyone have a better way to do it? > > All I can imagine to do this faster is(by Java or other languages): > Pull all > gene location info; Pull erlated sequence from biosequence table; > rotate > through the gene location list and retrieve the substring of the > sequence. > But this does not seem attractive for me since for different > applications, I > have to write code to pull the sequences by myself. Is it possible to > extend/modify the BioSQL schema to serve this purpose better? > > My understanding is that a lot subsequent applications would be only > interested in certain pieces of the whole genome sequences and there > must be > an efficient way to do that. If everyone has to invent their method, > the > BioSQL might be a little bit too limited. Any idea on this? > > Gang > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From gwu at molbio.mgh.harvard.edu Mon Jun 21 09:42:52 2004 From: gwu at molbio.mgh.harvard.edu (Gang Wu) Date: Mon Jun 21 09:47:31 2004 Subject: [BioSQL-l] how to quickly retrieve feature sequences In-Reply-To: Message-ID: It turned out it's quick enough to retrieve sequences such as gene, promter etc. The SQL I provided in last message had a 'bug' on "substring(t6.seq, t2.start_pos,t2.end_pos) seq" line, which will retrieve the subsequence starting at "t2.start_pos" with length of "t2.end_pos". But what I needed is the gene sequences, which should be "substring(t6.seq, t2.start_pos,t2.end_pos-t2.start_pos+1) seq". If the average length of gene sequences is 1-1.5k, retriving every 1000 gene sequences needs about 2-4 seconds on our server(Dell PowerEdge 2650 with dual Xeon 2.6G, 512K). Is this fast enough for you guys? Gang -----Original Message----- From: biosql-l-bounces@portal.open-bio.org [mailto:biosql-l-bounces@portal.open-bio.org]On Behalf Of Gang Wu Sent: Tuesday, June 15, 2004 1:31 PM To: biosql-l@open-bio.org Subject: RE: [BioSQL-l] how to quickly retrieve feature sequences SQL again: SELECT t1.seqfeature_id,t1.bioentry_id,t2.start_pos, t2.end_pos, t2.strand, t4.value locus_tag, substring(t6.seq, t2.start_pos,t2.end_pos) seq FROM `seqfeature` t1 inner join location t2 on t1.seqfeature_id=t2.seqfeature_id inner join term t3 on t1.type_term_id=t3.term_id inner join seqfeature_qualifier_value t4 on t1.seqfeature_id=t4.seqfeature_id inner join term t5 on t4.term_id=t5.term_id inner join biosequence t6 on t1.bioentry_id=t6.bioentry_id where t3.name='gene' and t5.name='locus_tag' limit 2 Gang -----Original Message----- From: biosql-l-bounces@portal.open-bio.org [mailto:biosql-l-bounces@portal.open-bio.org]On Behalf Of Gang Wu Sent: Tuesday, June 15, 2004 12:38 PM To: biosql-l@open-bio.org Subject: [BioSQL-l] how to quickly retrieve feature sequences Hi, I just loaded the 5 Arabidopsis thalian Genbank genome files into my sequence database(BioSQL 1.38). My question is: How can I efficiently retrieve all gene sequences from the database? I tried to do that by joining seqfeature, seqfeature_qualifier_value, location, term and biosequence tables, but it turned out to be extremely slow(See the attached SQL, 2 records take about 20 seconds on my Dell PowerEdge 2650 with dual 2.6G Xeons). Does anyone have a better way to do it? All I can imagine to do this faster is(by Java or other languages): Pull all gene location info; Pull erlated sequence from biosequence table; rotate through the gene location list and retrieve the substring of the sequence. But this does not seem attractive for me since for different applications, I have to write code to pull the sequences by myself. Is it possible to extend/modify the BioSQL schema to serve this purpose better? My understanding is that a lot subsequent applications would be only interested in certain pieces of the whole genome sequences and there must be an efficient way to do that. If everyone has to invent their method, the BioSQL might be a little bit too limited. Any idea on this? Gang _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l