From s.rayner at att.net Mon May 15 02:13:06 2006 From: s.rayner at att.net (s.rayner at att.net) Date: Mon, 15 May 2006 06:13:06 +0000 Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql Message-ID: <051520060613.9311.44681BF0000A2A7A0000245F21604666489D0A02970E9DD29C@att.net> Hello, I have been trying to upload the current release of uniprot (version 49.6) into MySQL using the most current version of load_seqdatabase.pl from CVS (# $Id: load_seqdatabase.pl,v 1.24 2006/01/19 21:34:29 lapp Exp $) I have tested the script on subsets of uniprot and it loads without problem, but when i attempt to load the full dataset, i end up with the follow error.... biowiv:/usr/lib/perl5/bioperl-db/scripts/biosql # perl load_seqdatabase.pl --dbname uniprot --dbuser XXXX --dbpass XXXX --format swiss /var/downloads/sequence/uniprot_sprot.dat Loading /var/downloads/sequence/uniprot_sprot.dat ... -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.","CRC-E7973FEA4B5611DC","","","") FKs ( I found where the script is hiccuping.... The Uniprot release contains lines with identical annotation for the RL keyword for two different sequences. ___________________ First occurence... ___________________ ID 1433T_PONPY STANDARD; PRT; 245 AA. AC Q5RFJ2; Q5RDK2; DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2005, sequence version 2. DT 18-APR-2006, entry version 13. DE 14-3-3 protein theta. GN Name=YWHAQ; OS Pongo pygmaeus (Orangutan). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; OC Catarrhini; Hominidae; Pongo. OX NCBI_TaxID=9600; RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. RC TISSUE=Brain cortex, and Kidney; RG The German cDNA consortium; RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. <====== Not Unique ___________________ Second occurence... ___________________ ID 1433G_PONPY STANDARD; PRT; 246 AA. AC Q5RC20; DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2005, sequence version 2. DT 18-APR-2006, entry version 13. DE 14-3-3 protein gamma. GN Name=YWHAG; OS Pongo pygmaeus (Orangutan). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; OC Catarrhini; Hominidae; Pongo. OX NCBI_TaxID=9600; RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. RC TISSUE=Heart; RG The German cDNA consortium; RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. <====== Not Unique in these two cases the generated CRC key is identical and so MySQL throws a wobbly. if i look at the MySQL entry in the REFERENCE table for the first sequence ------+-------+---------+----------------------+ | 139 | NULL | Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC | +--------------+-----------+---------------------------------------------------- and the error when the script choked was MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.","CRC-E7973FEA4B5611DC","","","") FKs ( References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> Message-ID: You found the right instance. Unfortunately with the way the bioperl swissprot parser works the group (RG) isn't promoted to author if there is no author in addition (in fact you may debate whether that would even be the best way of doing things), so it doesn't find it on second occurrence by unique key. If you can live without this entry, or any other entry that causes a hiccup, just supply the flag --safe and it will gracefully move on to the next entry. Fixing the issue would require either to fix the bioperl swissprot parser (or Bio::Annotation::Reference) to stick the RG group into the author slot if there is no author, or to fix Bioperl Bio::Annotation::Reference to also feature a group and biosql to use it in place of a missing author. Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql) should just use that in place of a missing author? The downside is that upon round-tripping an entry, the RG annotation line will become an RA annotation line. How bad would that be? Any thoughts from anyone? -hilmar On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote: > I found where the script is hiccuping.... > > The Uniprot release contains lines with identical annotation for > the RL keyword for two different sequences. > > ___________________ > > First occurence... > ___________________ > > ID 1433T_PONPY STANDARD; PRT; 245 AA. > AC Q5RFJ2; Q5RDK2; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein theta. > GN Name=YWHAQ; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Brain cortex, and Kidney; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > ___________________ > > Second occurence... > ___________________ > > > ID 1433G_PONPY STANDARD; PRT; 246 AA. > AC Q5RC20; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein gamma. > GN Name=YWHAG; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Heart; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > > in these two cases the generated CRC key is identical and so MySQL > throws a wobbly. > > if i look at the MySQL entry in the REFERENCE table for the first > sequence > ------+-------+---------+----------------------+ > | 139 | NULL | Submitted (NOV-2004) to the EMBL/ > GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC | > +--------------+----------- > +---------------------------------------------------- > > and the error when the script choked was > > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were > ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ > databases.","CRC-E7973FEA4B5611DC","","","") FKs ( Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3 > > hence the problem. > > I'm guessing i'm not the first person to encounter this, but dont > see any hints for an easy way around this. > > any suggestions....? > > ta > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From simon.rayner.cn at gmail.com Mon May 15 20:46:00 2006 From: simon.rayner.cn at gmail.com (simon rayner) Date: Tue, 16 May 2006 00:46:00 +0000 Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql In-Reply-To: 051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net Message-ID: <1147740360.3338.1.camel@biowiv.wivbio> thanks for the help. one way i was thinking of getting around it before i got your email about the --safe flag was to append an extra character to the offending string. So, in my case i would have "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases." and "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.2" which would presumably create different keys. However, i realise the downside of this is that i have now modified the data source... From hlapp at gmx.net Thu May 18 10:38:33 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 18 May 2006 10:38:33 -0400 Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql In-Reply-To: <1147740360.3338.1.camel@biowiv.wivbio> References: <1147740360.3338.1.camel@biowiv.wivbio> Message-ID: Yes, you have. Eventually I think this needs to be fixed by how the RG field is dealt with in either bioperl or bioperl-db. -hilmar On May 15, 2006, at 8:46 PM, simon rayner wrote: > thanks for the help. > > one way i was thinking of getting around it before i got your email > about the --safe flag was to append an extra character to the > offending > string. So, in my case i would have > > "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases." > > and > > "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.2" > > which would presumably create different keys. > > However, i realise the downside of this is that i have now > modified the data source... > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From darin.london at duke.edu Mon May 22 11:29:45 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 11:29:45 -0400 Subject: [BioSQL-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <4471D8E9.8090109@duke.edu> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. From Gerben.Menschaert at UGent.be Tue May 23 12:02:12 2006 From: Gerben.Menschaert at UGent.be (Gerben Menschaert) Date: Tue, 23 May 2006 18:02:12 +0200 Subject: [BioSQL-l] load_seqdatabase.pl error due to bad DBSOURCE parsing Message-ID: <20060523160212.1BF9814A0D6F@tarzan.ugent.be> Hello all, I'm trying to load genbank accession number Q99ML8 as a genpept file with the load_seqdatabase.pl script. It fails on the DB_SOURCE part: -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were ("","UniGene:Mm.388865","0","") FKs () ORA-01400: cannot insert NULL into ("TEST_BIOSQL"."SG_DBXREF"."DBNAME") (DBD ERROR: error possibly near <*> indicator at char 57 in 'INSERT INTO dbxref (dbname, accession, version) VALUES (:<*>p1, :p2, :p3)') --------------------------------------------------- The DBSOURCE block from the genpept file looks like this: DBSOURCE swissprot: locus UCN2_MOUSE, accession Q99ML8; class: standard. created: May 10, 2002. sequence updated: Jun 1, 2001. annotation updated: May 16, 2006. xrefs: AF331517.1, AAK16157.1 xrefs (non-sequence databases): UniGene:Mm.388865, Ensembl:ENSMUSG00000049699, MGI:2176375, GO:0005576, GO:0001664, GO:0006171, GO:0007586, GO:0006950 How is this parsed? Could anybody point me into the good direction? Regards, Gerben From hlapp at gmx.net Tue May 23 13:03:30 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 23 May 2006 13:03:30 -0400 Subject: [BioSQL-l] load_seqdatabase.pl error due to bad DBSOURCE parsing In-Reply-To: <20060523160212.1BF9814A0D6F@tarzan.ugent.be> References: <20060523160212.1BF9814A0D6F@tarzan.ugent.be> Message-ID: This is in reality a Uniprot entry. The Genbank parser apparently doesn't succeed in picking apart accession and namespace prefix (dbname). If at all possible I'd recommend loading Uniprot in Uniprot (swissprot) format. Would that work for you? -hilmar On May 23, 2006, at 12:02 PM, Gerben Menschaert wrote: > Hello all, > > I'm trying to load genbank accession number Q99ML8 as a genpept > file with > the load_seqdatabase.pl script. It fails on the DB_SOURCE part: > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, > values were > ("","UniGene:Mm.388865","0","") FKs () > ORA-01400: cannot insert NULL into > ("TEST_BIOSQL"."SG_DBXREF"."DBNAME") (DBD > ERROR: error possibly near <*> indicator at char 57 in 'INSERT INTO > dbxref > (dbname, accession, version) VALUES (:<*>p1, :p2, :p3)') > --------------------------------------------------- > > The DBSOURCE block from the genpept file looks like this: > > DBSOURCE swissprot: locus UCN2_MOUSE, accession Q99ML8; > class: standard. > created: May 10, 2002. > sequence updated: Jun 1, 2001. > annotation updated: May 16, 2006. > xrefs: AF331517.1, AAK16157.1 > xrefs (non-sequence databases): UniGene:Mm.388865, > Ensembl:ENSMUSG00000049699, MGI:2176375, GO:0005576, GO: > 0001664, > GO:0006171, GO:0007586, GO:0006950 > > How is this parsed? Could anybody point me into the good direction? > > Regards, > Gerben > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gad14 at cornell.edu Tue May 23 15:38:31 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Tue, 23 May 2006 15:38:31 -0400 Subject: [BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with biosql In-Reply-To: References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> Message-ID: <447364B7.4030203@cornell.edu> Hi Hilmar, I apologize in advance if I'm talking about something that is well documented somewhere but I'm still having trouble understanding exactly what I need to do to get a biosql database loaded in such a way so that it can interact fully with gbrowse - it seems to be half way there. I use load_seqdatabase.pl to load the genome sequence (single sequence in fasta format) into a biosql database but I populate the db with features using what I thought was a GFF-centric approach, not with load_seqdatabase.pl -- see my code in #4 below. Here is exactly what I do: 1) Create a mysql database called 'test_biosql' with correct permissions 2) Load the biosql schema: mysql --user=xx --password=xx test_biosql < /usr/local/biosql-schema/sql/biosqldb-mysql.sql 3) Use load_seqdatabase.pl to load the single genomic dna sequence: load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 -namespace NC_004578 -format fasta 6853.fasta 4) I then use a script I wrote to load the SeqFeatures which are in gff format in a file i pass in as as arg ($in). Here is the code: # read gff file into gff io object my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3); # create a Bio::DB::DBAdaptorI implementing object my $db = Bio::DB::BioDB->new(-database => $dbname, -port => $port, -dbname => $database, -driver => $driver, -user => $user, -pass => $pass, ); # get appropriate object adaptor my $adp = $db->get_object_adaptor("Bio::SeqI"); my $acc = "NC_004578"; # the genome seq id already in the db my $seq = Bio::Seq->new(-accession_number => $acc, -display_id => $acc, -primary_id => $acc, -namespace => $acc); # Locate entry matching the unique key attributes and populate a # persistent object with this entry. my $dbseq = $adp->find_by_unique_key($seq); # insert features from gff file into database. while (my $feat = $gffio->next_feature()) { $dbseq->add_SeqFeature($feat); $dbseq->store; $dbseq->commit(); } Is there additional code I should have here? I realize you're not a expert/user of gbrowse.. and this problem seems to be related to the gbrowse_details cgi script, which you probably are not familiar with. But I'm CC'ing the lists in case anyone else has some clues. I do appreciate any insight you might have though. It would be good to know if I'm doing all that I need to do to fully and correctly populate a biosql db with GFF/SeqFeature. Thanks, Genevieve Hilmar Lapp wrote: > Hi Genevieve, > > there's a couple more regular users of BioSQL than one (about 25-30 > groups), but not many who run GBrowse off of BioSQL (and I don't count > among those - yet). > > Of those who have posted before that they accomplished this, I believe > none were using load_seqdatabase.pl to load the data. Instead, they > loaded data through the DBGFF adaptor for BioSQL, i.e., like you would > load data into a GFF database, just using a different adaptor. > > load_seqdatabase.pl will load through the sequence-centric Bioperl > object model, and has no notion of GFF or GFF3 and associated > constraints (controlled vocabulary for feature type and source terms, > location types etc). > > It is probably possible to load data through load_seqdatabase.pl and > then render it through GBrowse but doing so will almost certainly > require a SeqProcessor (see --pipeline argument to load_seqdatabase.pl) > to be written that will appropriately unflatten the feature array and > in fact probably have to use SeqFeature::Annotated (where actually did > SeqFeature::TypedSeqFeature go?). In parallel, bioperl-db will need to > be fixed to be prepared for SeqFeatureI implementations that use > ontology terms for primary_tag and source_tag instead of strings. > > Is it possible for you to load your data through a GFF3 intermediary? > Bioperl has modules and in fact scripts that will write GFF3 (if I'm > not mistaken ...). > > -hilmar > > On May 18, 2006, at 5:58 PM, Lincoln Stein wrote: > >> Hi Genevieve, >> >> The problem is that none of us really knows anything about BioSQL. >> Hilmar is >> the only regular user of this database. He's now gone to NESCent (duke >> university) and may not be receiving mail sent to GNF. >> >> Lincoln >> >> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote: >> >>> Hi Scott, >>> >>> I'm still having the same problem. It might have to do with how the >>> BioSQL database is populated. I use the load_seqdatabase.pl script to >>> load the database along with bioperl-db functions for loading >>> SeqFeatures directly. I took a closer look at how the tables are >>> populated in the biosql tables. (If you're not familiar with BioSQL the >>> following may not be familiar to you -- i just want to put this >>> observation out there...). I noticed that the 'term_id' field in the >>> Location table was empty for the first gene record i had loaded. When I >>> set term_id to be '11', the id that corresponds with the 'gene' >>> ontology >>> term, i notice a positive change in what's displayed on the >>> gbrowse_details page for this record... the name of the gene 'dnaA' now >>> appears in the title line in large blue font, as it should. The class >>> name is still missing, as does all the detail about this gene - >>> coordinates, etc. >>> >>> Lincoln suggests that I talk directly Hilmar Lapp who is the main >>> BioSQL >>> developer. It could be that I am bumping up against things that haven't >>> been developed yet as far as the GBrowse<->BioSQL db connectivity goes. >>> I've been taking a closer look at gbrowse_details.pl, Browser.pm and >>> Util.pm in order to try to understand where the disconnect might be... >>> >>> To answer your question below.. yes GBrowse works fine for the >>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table database. >>> I'm >>> using this installation of GBrowse 1.64 for several MySQL databases >>> with >>> the default gbrowse tables... everything is working fine. My only >>> trouble with gbrowse crops up when interfacing with the biosql mysql >>> db. >>> >>> Thanks for all your help, >>> Genevieve >>> >>> Scott Cain wrote: >>> >>>> Hi Genevieve, >>>> >>>> I'm sorry this has hung out there unanswered for so long. I >>>> suppose it >>>> was because I chose not to answer it because it involved BioSQL >>>> (which I >>>> know just about nothing about) and Simon seemed to think that the >>>> MySQL >>>> adaptor was involved somehow (though it doesn't look to me like it >>>> is). >>>> >>>> Anyway, I'll try to get started answering your questions (assuming you >>>> haven't already puzzled you way to one already). See my comments >>>> below. >>>> >>>> Scott >>>> >>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote: >>>> >>>>> Hi, >>>>> I'm running gbrowse 1.64 with a biosql database on a mac with bioperl >>>>> 1.5.1. I successfully loaded the database with load_seqdatabase.pl >>>>> with >>>>> NC_004578.gbk from NCBI. >>>>> >>>>> The features display as they should on the main gbrowse details pane. >>>>> However, when I click on one of the features I get GBrowse Details >>>>> data >>>>> record page with ":Details" at the top in large blue font but no data >>>>> for that gene display. In smaller red font, "Requested feature not >>>>> found >>>>> in database" which is followed by the normal details page footer info >>>>> ("For the source code for this browser, see...", etc). >>>>> >>>>> I'm using the 06.biosql.conf file - with appropriate additions in >>>>> db_args for my database. I changed 'link' to >>>>> >>>>> link = AUTO >>>>> >>>>> from what was there >>>>> >>>>> link = >>>>> http://localhost/perl/gbrowse?ref=$ref;start=$start;stop= $end >>>>> >>>>> The default suggestion for 'link' is a little confusing.. why does it >>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why isn't >>>>> 'cgi-bin' in the path? >>>> >>>> >>>> I'm not sure why the suggested link is the way it is; perhaps that >>>> config file predates the gbrowse_details script and no one changed >>>> this >>>> sample config file. I changed it and it will be changed in the next >>>> release. >>>> >>>> As for the path, 'perl' is a common url convention for scripts that >>>> are >>>> running under mod_perl, so I suspect the person who wrote this sample >>>> config file was running mod_perl. >>>> >>>>> When i set link to 'AUTO' I at least get the details page. >>>>> gbrowse_details is not getting what it needs to disaply the record >>>>> info >>>>> though. The webserver error I get is: >>>>> >>>>> Subroutine Bio::SeqFeature::Generic::type redefined at >>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/ BioSQL.pm >>>>> line 126. >>>> >>>> >>>> I'm not sure this is really the problem. Let me make sure: it is >>>> after >>>> you changed link to AUTO that you see this? That is, the page you see >>>> now is as you described in your second paragraph, right? >>>> Unfortunately, >>>> this is the part where I become particularly useless, since I don't >>>> know >>>> anything about BioSQL. Is the details page working OK for the >>>> yeast_chr1 dataset? >>>> >>>> Scott >>>> >>>>> I took a look at line 126 in BioSQL.pm - not sure what to make of it. >>>>> >>>>> Any ideas? Am I overlooking anything? >>>>> >>>>> Thanks, >>>>> Genevieve >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------- >>>>> Using Tomcat but need to do more? Need to support web services, >>>>> security? >>>>> Get stuff done quickly with pre-integrated technology to make your >>>>> job >>>>> easier Download IBM WebSphere Application Server v.1.0.1 based on >>>>> Apache >>>>> Geronimo >>>>> http://sel.as-us.falkag.net/sel? >>>>> cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>> _______________________________________________ >>>>> Gmod-gbrowse mailing list >>>>> Gmod-gbrowse at lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu >> >> >> ------------------------------------------------------- >> Using Tomcat but need to do more? Need to support web services, >> security? >> Get stuff done quickly with pre-integrated technology to make your >> job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache >> Geronimo >> http://sel.as-us.falkag.net/sel? cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> > From hlapp at gmx.net Tue May 23 23:50:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 24 May 2006 04:50:21 +0100 Subject: [BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with biosql In-Reply-To: <447364B7.4030203@cornell.edu> References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> <447364B7.4030203@cornell.edu> Message-ID: Hi Genevieve & Scott, see below for interspersed comments. On May 23, 2006, at 3:38 PM, Genevieve DeClerck wrote: > Hi Hilmar, > > I apologize in advance if I'm talking about something that is well > documented somewhere > but I'm still having trouble understanding exactly what I need to > do to get a biosql database loaded in such a way so that it can > interact fully with gbrowse - it seems to be half way there. > > I use load_seqdatabase.pl to load the genome sequence (single > sequence in fasta format) into a biosql database but I populate the > db with features using what I thought was a GFF-centric approach, > not with load_seqdatabase.pl -- see my code in #4 below. > > Here is exactly what I do: > > [...] > 3) Use load_seqdatabase.pl to load the single genomic dna sequence: > > load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 - > namespace NC_004578 -format fasta 6853.fasta > > 4) I then use a script I wrote to load the SeqFeatures which are in > gff format in a file i pass in as as arg ($in). Here is the code: > > > # read gff file into gff io object > my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3); > > # create a Bio::DB::DBAdaptorI implementing object > my $db = Bio::DB::BioDB->new(-database => $dbname, > -port => $port, > -dbname => $database, > -driver => $driver, > -user => $user, > -pass => $pass, > ); > > # get appropriate object adaptor > my $adp = $db->get_object_adaptor("Bio::SeqI"); > > my $acc = "NC_004578"; # the genome seq id already in the db > my $seq = Bio::Seq->new(-accession_number => $acc, > -display_id => $acc, > -primary_id => $acc, > -namespace => $acc); > > # Locate entry matching the unique key attributes and populate a > # persistent object with this entry. > my $dbseq = $adp->find_by_unique_key($seq); > > # insert features from gff file into database. > while (my $feat = $gffio->next_feature()) { > $dbseq->add_SeqFeature($feat); > $dbseq->store; > $dbseq->commit(); > } > > > Is there additional code I should have here? I realize you're not a > expert/user of gbrowse.. and this problem seems to be related to > the gbrowse_details cgi script, which you probably are not familiar > with. So your use case is that you have a sequence in simple fasta format with its annotation in another file in GFF3 format, and you want to load both into a Biosql database and visualize in GBrowse. It looks like I was in fact on the wrong path the whole time. The Gbrowse Biosql adaptor that I can find is a Bio::DasI adaptor through which you cannot load but only retrieve, so I have to assume that you were right in using load_seqdatabase.pl. Can somebody help out here who has been using Biosql as the underlying database and confirm or set me straight? If that is the procedure then your code looks alright. Also, it looks like Bio::Tools::GFF does not return hierarchical feature graphs for v3 input (which bioperl-db wouldn't handle properly because it doesn't support the feature_relationship table yet). So, I'm in fact at a loss explaining why the details page doesn't work for you, given that people reported it to work before. I'm inclined to claim that the respective Gbrowse code has changed, either in the way it expects the feature to be set up, or in the way it uses the DasI interface, and broke the Biosql adaptor. Can somebody (Scott? Lincoln?) comment on whether there were any changes in this regard? The lines in gbrowse_detail that look like lead to the problem is my @features = sort {$b->length<=>$a->length} $CONFIG->_feature_get ($db,$name,$class); @features = sort {$b->length<=>$a->length} $CONFIG->_feature_get ($db,$ref,$class,$start,$end,1) unless @features; neither of which returns any matches. I have no clue yet how those two calls get translated into DasI to bioperl-db queries. -hilmar > But I'm CC'ing the lists in case anyone else has some clues. I do > appreciate any insight you might have though. It would be good to > know if I'm doing all that I need to do to fully and correctly > populate a biosql db with GFF/SeqFeature. > > Thanks, > Genevieve > > > > Hilmar Lapp wrote: >> Hi Genevieve, >> there's a couple more regular users of BioSQL than one (about >> 25-30 groups), but not many who run GBrowse off of BioSQL (and I >> don't count among those - yet). >> Of those who have posted before that they accomplished this, I >> believe none were using load_seqdatabase.pl to load the data. >> Instead, they loaded data through the DBGFF adaptor for BioSQL, >> i.e., like you would load data into a GFF database, just using a >> different adaptor. >> load_seqdatabase.pl will load through the sequence-centric >> Bioperl object model, and has no notion of GFF or GFF3 and >> associated constraints (controlled vocabulary for feature type >> and source terms, location types etc). >> It is probably possible to load data through load_seqdatabase.pl >> and then render it through GBrowse but doing so will almost >> certainly require a SeqProcessor (see --pipeline argument to >> load_seqdatabase.pl) to be written that will appropriately >> unflatten the feature array and in fact probably have to use >> SeqFeature::Annotated (where actually did >> SeqFeature::TypedSeqFeature go?). In parallel, bioperl-db will >> need to be fixed to be prepared for SeqFeatureI implementations >> that use ontology terms for primary_tag and source_tag instead of >> strings. >> Is it possible for you to load your data through a GFF3 >> intermediary? Bioperl has modules and in fact scripts that will >> write GFF3 (if I'm not mistaken ...). >> -hilmar >> On May 18, 2006, at 5:58 PM, Lincoln Stein wrote: >>> Hi Genevieve, >>> >>> The problem is that none of us really knows anything about >>> BioSQL. Hilmar is >>> the only regular user of this database. He's now gone to NESCent >>> (duke >>> university) and may not be receiving mail sent to GNF. >>> >>> Lincoln >>> >>> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote: >>> >>>> Hi Scott, >>>> >>>> I'm still having the same problem. It might have to do with how the >>>> BioSQL database is populated. I use the load_seqdatabase.pl >>>> script to >>>> load the database along with bioperl-db functions for loading >>>> SeqFeatures directly. I took a closer look at how the tables are >>>> populated in the biosql tables. (If you're not familiar with >>>> BioSQL the >>>> following may not be familiar to you -- i just want to put this >>>> observation out there...). I noticed that the 'term_id' field in >>>> the >>>> Location table was empty for the first gene record i had >>>> loaded. When I >>>> set term_id to be '11', the id that corresponds with the 'gene' >>>> ontology >>>> term, i notice a positive change in what's displayed on the >>>> gbrowse_details page for this record... the name of the gene >>>> 'dnaA' now >>>> appears in the title line in large blue font, as it should. The >>>> class >>>> name is still missing, as does all the detail about this gene - >>>> coordinates, etc. >>>> >>>> Lincoln suggests that I talk directly Hilmar Lapp who is the >>>> main BioSQL >>>> developer. It could be that I am bumping up against things that >>>> haven't >>>> been developed yet as far as the GBrowse<->BioSQL db >>>> connectivity goes. >>>> I've been taking a closer look at gbrowse_details.pl, Browser.pm >>>> and >>>> Util.pm in order to try to understand where the disconnect >>>> might be... >>>> >>>> To answer your question below.. yes GBrowse works fine for the >>>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table >>>> database. I'm >>>> using this installation of GBrowse 1.64 for several MySQL >>>> databases with >>>> the default gbrowse tables... everything is working fine. My only >>>> trouble with gbrowse crops up when interfacing with the biosql >>>> mysql db. >>>> >>>> Thanks for all your help, >>>> Genevieve >>>> >>>> Scott Cain wrote: >>>> >>>>> Hi Genevieve, >>>>> >>>>> I'm sorry this has hung out there unanswered for so long. I >>>>> suppose it >>>>> was because I chose not to answer it because it involved >>>>> BioSQL (which I >>>>> know just about nothing about) and Simon seemed to think that >>>>> the MySQL >>>>> adaptor was involved somehow (though it doesn't look to me >>>>> like it is). >>>>> >>>>> Anyway, I'll try to get started answering your questions >>>>> (assuming you >>>>> haven't already puzzled you way to one already). See my >>>>> comments below. >>>>> >>>>> Scott >>>>> >>>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote: >>>>> >>>>>> Hi, >>>>>> I'm running gbrowse 1.64 with a biosql database on a mac with >>>>>> bioperl >>>>>> 1.5.1. I successfully loaded the database with >>>>>> load_seqdatabase.pl with >>>>>> NC_004578.gbk from NCBI. >>>>>> >>>>>> The features display as they should on the main gbrowse >>>>>> details pane. >>>>>> However, when I click on one of the features I get GBrowse >>>>>> Details data >>>>>> record page with ":Details" at the top in large blue font but >>>>>> no data >>>>>> for that gene display. In smaller red font, "Requested >>>>>> feature not found >>>>>> in database" which is followed by the normal details page >>>>>> footer info >>>>>> ("For the source code for this browser, see...", etc). >>>>>> >>>>>> I'm using the 06.biosql.conf file - with appropriate additions in >>>>>> db_args for my database. I changed 'link' to >>>>>> >>>>>> link = AUTO >>>>>> >>>>>> from what was there >>>>>> >>>>>> link = http://localhost/perl/gbrowse?ref=$ref;start= >>>>>> $start;stop= $end >>>>>> >>>>>> The default suggestion for 'link' is a little confusing.. why >>>>>> does it >>>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why >>>>>> isn't >>>>>> 'cgi-bin' in the path? >>>>> >>>>> >>>>> I'm not sure why the suggested link is the way it is; perhaps that >>>>> config file predates the gbrowse_details script and no one >>>>> changed this >>>>> sample config file. I changed it and it will be changed in the >>>>> next >>>>> release. >>>>> >>>>> As for the path, 'perl' is a common url convention for scripts >>>>> that are >>>>> running under mod_perl, so I suspect the person who wrote this >>>>> sample >>>>> config file was running mod_perl. >>>>> >>>>>> When i set link to 'AUTO' I at least get the details page. >>>>>> gbrowse_details is not getting what it needs to disaply the >>>>>> record info >>>>>> though. The webserver error I get is: >>>>>> >>>>>> Subroutine Bio::SeqFeature::Generic::type redefined at >>>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/ >>>>>> BioSQL.pm >>>>>> line 126. >>>>> >>>>> >>>>> I'm not sure this is really the problem. Let me make sure: it >>>>> is after >>>>> you changed link to AUTO that you see this? That is, the page >>>>> you see >>>>> now is as you described in your second paragraph, right? >>>>> Unfortunately, >>>>> this is the part where I become particularly useless, since I >>>>> don't know >>>>> anything about BioSQL. Is the details page working OK for the >>>>> yeast_chr1 dataset? >>>>> >>>>> Scott >>>>> >>>>>> I took a look at line 126 in BioSQL.pm - not sure what to >>>>>> make of it. >>>>>> >>>>>> Any ideas? Am I overlooking anything? >>>>>> >>>>>> Thanks, >>>>>> Genevieve >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------- >>>>>> Using Tomcat but need to do more? Need to support web >>>>>> services, security? >>>>>> Get stuff done quickly with pre-integrated technology to make >>>>>> your job >>>>>> easier Download IBM WebSphere Application Server v.1.0.1 >>>>>> based on Apache >>>>>> Geronimo >>>>>> http://sel.as-us.falkag.net/sel? >>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>>> _______________________________________________ >>>>>> Gmod-gbrowse mailing list >>>>>> Gmod-gbrowse at lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> >>> >>> -- >>> Lincoln D. Stein >>> Cold Spring Harbor Laboratory >>> 1 Bungtown Road >>> Cold Spring Harbor, NY 11724 >>> (516) 367-8380 (voice) >>> (516) 367-8389 (fax) >>> FOR URGENT MESSAGES & SCHEDULING, >>> PLEASE CONTACT MY ASSISTANT, >>> SANDRA MICHELSEN, AT michelse at cshl.edu >>> >>> >>> ------------------------------------------------------- >>> Using Tomcat but need to do more? Need to support web services, >>> security? >>> Get stuff done quickly with pre-integrated technology to make >>> your job easier >>> Download IBM WebSphere Application Server v.1.0.1 based on >>> Apache Geronimo >>> http://sel.as-us.falkag.net/sel? >>> cmd=lnk&kid=120709&bid=263057&dat=121642 >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> > > > > ------------------------------------------------------- > All the advantages of Linux Managed Hosting--Without the Cost and > Risk! > Fully trained technicians. The highest number of Red Hat > certifications in > the hosting industry. Fanatical Support. Click to learn more > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=107521&bid=248729&dat=121642 > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mjcipriano at lbl.gov Tue May 30 18:26:24 2006 From: mjcipriano at lbl.gov (Michael Cipriano) Date: Tue, 30 May 2006 15:26:24 -0700 Subject: [BioSQL-l] Problem with add feature under BioSQL In-Reply-To: References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> <447364B7.4030203@cornell.edu> Message-ID: <1149027984.3139.105.camel@alien> Hello, I have found a problem with adding features via gbrowse_img with the add=xxx tag. I am using the CVS version of GGB, bioperl-live and BioSQL schema on mysql. When using the add=xxx tag, it will produce a fatal error (with BioSQL/Das interface). There error shown in the error log is: link: /cgi-bin/gbrowse_img/bacteria/?name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; ERROR from apache error_log: [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't locate object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" at /var/www/cgi-bin/gbrowse_img line 502. [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Premature end of script headers: gbrowse_img This is from this section of code at line ~506: unless ($segments{$refname}) { my @segments = map { eval{$_->absolute(0)}; $_ # so that rel2abs works properly later } grep { $current_segment->overlaps($_) } get_segments($db, $refname); return unless @segments; $segments{$refname} = $segments[0]; } The overlaps function is not defined in the Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. The fix was the inclusion of Bio::RangeI in the @ISA variable (shown below) in the file Bio/DB/Das/BioSQL/Segment.pm #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); I am not sure if this will have any other consequences other then fixing the bug I mentioned (and possibly fixing something else). Can anyone tell me if this will introduce any new bugs, and if not, can someone commit this change. Thanks, Michael Cipriano Developer - LBNL From hlapp at gmx.net Wed May 31 14:33:40 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:33:40 -0400 Subject: [BioSQL-l] Problem with add feature under BioSQL In-Reply-To: <1149027984.3139.105.camel@alien> References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> <447364B7.4030203@cornell.edu> <1149027984.3139.105.camel@alien> Message-ID: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> This should be a Gbrowse problem, not in Biosql or bioperl-db unless I'm missing something? Just trying to make sure ... -hilmar On May 30, 2006, at 6:26 PM, Michael Cipriano wrote: > Hello, > > I have found a problem with adding features via gbrowse_img with the > add=xxx tag. I am using the CVS version of GGB, bioperl-live and > BioSQL > schema on mysql. > > When using the add=xxx tag, it will produce a fatal error (with > BioSQL/Das interface). There error shown in the error log is: > link: > /cgi-bin/gbrowse_img/bacteria/? > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; > > ERROR from apache error_log: > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't > locate > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" > at /var/www/cgi-bin/gbrowse_img line 502. > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] > Premature end > of script headers: gbrowse_img > > This is from this section of code at line ~506: > > unless ($segments{$refname}) { > my @segments = map { > eval{$_->absolute(0)}; $_ # so that rel2abs works properly > later > } > grep { $current_segment->overlaps($_) } get_segments($db, > $refname); > return unless @segments; > $segments{$refname} = $segments[0]; > } > > > The overlaps function is not defined in the > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. > > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown > below) in the file Bio/DB/Das/BioSQL/Segment.pm > > > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); > > > I am not sure if this will have any other consequences other then > fixing > the bug I mentioned (and possibly fixing something else). > > Can anyone tell me if this will introduce any new bugs, and if not, > can > someone commit this change. > > Thanks, > Michael Cipriano > Developer - LBNL > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mjcipriano at lbl.gov Wed May 31 14:50:53 2006 From: mjcipriano at lbl.gov (Michael Cipriano) Date: Wed, 31 May 2006 11:50:53 -0700 Subject: [BioSQL-l] Problem with add feature under BioSQL In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> <447364B7.4030203@cornell.edu> <1149027984.3139.105.camel@alien> <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> Message-ID: <1149101453.3139.112.camel@alien> Hi, Yes, I only see the problem with gbrowse, though it could come up anytime someone wants connect with Das using Bio::DB::Das::BioSQL and needs the overlap function (or other range functions) in the Segment object. -Michael On Wed, 2006-05-31 at 14:33 -0400, Hilmar Lapp wrote: > This should be a Gbrowse problem, not in Biosql or bioperl-db unless > I'm missing something? Just trying to make sure ... > > -hilmar > > On May 30, 2006, at 6:26 PM, Michael Cipriano wrote: > > > Hello, > > > > I have found a problem with adding features via gbrowse_img with the > > add=xxx tag. I am using the CVS version of GGB, bioperl-live and > > BioSQL > > schema on mysql. > > > > When using the add=xxx tag, it will produce a fatal error (with > > BioSQL/Das interface). There error shown in the error log is: > > link: > > /cgi-bin/gbrowse_img/bacteria/? > > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; > > > > ERROR from apache error_log: > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't > > locate > > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" > > at /var/www/cgi-bin/gbrowse_img line 502. > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] > > Premature end > > of script headers: gbrowse_img > > > > This is from this section of code at line ~506: > > > > unless ($segments{$refname}) { > > my @segments = map { > > eval{$_->absolute(0)}; $_ # so that rel2abs works properly > > later > > } > > grep { $current_segment->overlaps($_) } get_segments($db, > > $refname); > > return unless @segments; > > $segments{$refname} = $segments[0]; > > } > > > > > > The overlaps function is not defined in the > > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. > > > > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown > > below) in the file Bio/DB/Das/BioSQL/Segment.pm > > > > > > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN > > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); > > > > > > I am not sure if this will have any other consequences other then > > fixing > > the bug I mentioned (and possibly fixing something else). > > > > Can anyone tell me if this will introduce any new bugs, and if not, > > can > > someone commit this change. > > > > Thanks, > > Michael Cipriano > > Developer - LBNL > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > From lstein at cshl.edu Wed May 31 14:47:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 14:47:47 -0400 Subject: [BioSQL-l] [Gmod-gbrowse] Re: Problem with add feature under BioSQL In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> References: <4447A8D5.4@cornell.edu> <1149027984.3139.105.camel@alien> <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> Message-ID: <200605311447.49454.lstein@cshl.edu> I think this is a problem in the Bio::DB::Das::BioSQL::Segment module. I will add an overlaps() method. Lincoln On Wednesday 31 May 2006 14:33, Hilmar Lapp wrote: > This should be a Gbrowse problem, not in Biosql or bioperl-db unless > I'm missing something? Just trying to make sure ... > > -hilmar > > On May 30, 2006, at 6:26 PM, Michael Cipriano wrote: > > Hello, > > > > I have found a problem with adding features via gbrowse_img with the > > add=xxx tag. I am using the CVS version of GGB, bioperl-live and > > BioSQL > > schema on mysql. > > > > When using the add=xxx tag, it will produce a fatal error (with > > BioSQL/Das interface). There error shown in the error log is: > > link: > > /cgi-bin/gbrowse_img/bacteria/? > > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; > > > > ERROR from apache error_log: > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't > > locate > > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" > > at /var/www/cgi-bin/gbrowse_img line 502. > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] > > Premature end > > of script headers: gbrowse_img > > > > This is from this section of code at line ~506: > > > > unless ($segments{$refname}) { > > my @segments = map { > > eval{$_->absolute(0)}; $_ # so that rel2abs works properly > > later > > } > > grep { $current_segment->overlaps($_) } get_segments($db, > > $refname); > > return unless @segments; > > $segments{$refname} = $segments[0]; > > } > > > > > > The overlaps function is not defined in the > > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. > > > > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown > > below) in the file Bio/DB/Das/BioSQL/Segment.pm > > > > > > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN > > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); > > > > > > I am not sure if this will have any other consequences other then > > fixing > > the bug I mentioned (and possibly fixing something else). > > > > Can anyone tell me if this will introduce any new bugs, and if not, > > can > > someone commit this change. > > > > Thanks, > > Michael Cipriano > > Developer - LBNL > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Wed May 31 14:56:30 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 14:56:30 -0400 Subject: [BioSQL-l] [Gmod-gbrowse] Re: Problem with add feature under BioSQL In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> References: <4447A8D5.4@cornell.edu> <1149027984.3139.105.camel@alien> <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> Message-ID: <200605311456.32088.lstein@cshl.edu> I've just committed the fix into CVS. This will be available in the upcoming gbrowse release as well. Lincoln On Wednesday 31 May 2006 14:33, Hilmar Lapp wrote: > This should be a Gbrowse problem, not in Biosql or bioperl-db unless > I'm missing something? Just trying to make sure ... > > -hilmar > > On May 30, 2006, at 6:26 PM, Michael Cipriano wrote: > > Hello, > > > > I have found a problem with adding features via gbrowse_img with the > > add=xxx tag. I am using the CVS version of GGB, bioperl-live and > > BioSQL > > schema on mysql. > > > > When using the add=xxx tag, it will produce a fatal error (with > > BioSQL/Das interface). There error shown in the error log is: > > link: > > /cgi-bin/gbrowse_img/bacteria/? > > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; > > > > ERROR from apache error_log: > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't > > locate > > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" > > at /var/www/cgi-bin/gbrowse_img line 502. > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] > > Premature end > > of script headers: gbrowse_img > > > > This is from this section of code at line ~506: > > > > unless ($segments{$refname}) { > > my @segments = map { > > eval{$_->absolute(0)}; $_ # so that rel2abs works properly > > later > > } > > grep { $current_segment->overlaps($_) } get_segments($db, > > $refname); > > return unless @segments; > > $segments{$refname} = $segments[0]; > > } > > > > > > The overlaps function is not defined in the > > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. > > > > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown > > below) in the file Bio/DB/Das/BioSQL/Segment.pm > > > > > > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN > > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); > > > > > > I am not sure if this will have any other consequences other then > > fixing > > the bug I mentioned (and possibly fixing something else). > > > > Can anyone tell me if this will introduce any new bugs, and if not, > > can > > someone commit this change. > > > > Thanks, > > Michael Cipriano > > Developer - LBNL > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From d49228002 at ym.edu.tw Wed May 31 10:39:16 2006 From: d49228002 at ym.edu.tw (Yi-Feng Chang) Date: Wed, 31 May 2006 22:39:16 +0800 Subject: [BioSQL-l] Error loading ontology terms Message-ID: <000001c68c97$f461ac70$6801a8c0@iannb> Dear All, I've checked biosql archives, and found a similar thread (http://lists.open-bio.org/pipermail/biojava-l/2005-November/005151.html) however, it did not give specific solution. So I post here again, and hope there are someone could help me. I'm using JDK1.5.0_05, Biojava 1.4, Biosql 1.41, and Mysql 5.0 with My_connectJ 3.1 I was following the demo source that provide by biojava-in-anger except for the database connection the exceptions were listed in following: In first connection there would be a connection error *** Importing a core ontology -- hope this is okay *** Importing terms Exception in thread "main" org.biojava.bio.BioException: Error connecting to BioSQL database: Connection is closed. at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:276) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.(BioSQLSequenceDB.java:194) at genevote.BioSQLTest.loadSeq(BioSQLTest.java:31) at genevote.BioSQLTest.main(BioSQLTest.java:70) Caused by: java.sql.SQLException: Connection is closed. at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.checkOpen(PoolingDataSource.java:219) at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.createStatement(PoolingDataSource.java:248) at org.biojava.bio.seq.db.biosql.MySQLDBHelper.getInsertID(MySQLDBHelper.java:68) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:268) ... 3 more Then I tried again, it works, and I put all sequences in genbank format into biosql db without error. But, while I tried to extract sequences, exception comes again. org.biojava.bio.BioException: Error loading ontology terms at org.biojava.bio.seq.db.biosql.OntologySQL.loadOntology(OntologySQL.java:444) at org.biojava.bio.seq.db.biosql.OntologySQL.getOntology(OntologySQL.java:116) at org.biojava.bio.seq.db.biosql.OntologySQL.(OntologySQL.java:413) at org.biojava.bio.seq.db.biosql.OntologySQL.getOntologySQL(OntologySQL.java:72) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:240) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.(BioSQLSequenceDB.java:194) at genevote.test.loadSeq(test.java:25) at genevote.test.main(test.java:76) Caused by: java.sql.SQLException: Unknown column 'name' in 'field list' at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2851) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1534) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1625) at com.mysql.jdbc.Connection.execSQL(Connection.java:2297) at com.mysql.jdbc.Connection.execSQL(Connection.java:2226) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1812) at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1657) at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) at org.biojava.bio.seq.db.biosql.OntologySQL.loadTerms(OntologySQL.java:339) at org.biojava.bio.seq.db.biosql.OntologySQL.loadOntology(OntologySQL.java:441) ... 7 more From s.rayner at att.net Mon May 15 06:13:06 2006 From: s.rayner at att.net (s.rayner at att.net) Date: Mon, 15 May 2006 06:13:06 +0000 Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql Message-ID: <051520060613.9311.44681BF0000A2A7A0000245F21604666489D0A02970E9DD29C@att.net> Hello, I have been trying to upload the current release of uniprot (version 49.6) into MySQL using the most current version of load_seqdatabase.pl from CVS (# $Id: load_seqdatabase.pl,v 1.24 2006/01/19 21:34:29 lapp Exp $) I have tested the script on subsets of uniprot and it loads without problem, but when i attempt to load the full dataset, i end up with the follow error.... biowiv:/usr/lib/perl5/bioperl-db/scripts/biosql # perl load_seqdatabase.pl --dbname uniprot --dbuser XXXX --dbpass XXXX --format swiss /var/downloads/sequence/uniprot_sprot.dat Loading /var/downloads/sequence/uniprot_sprot.dat ... -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.","CRC-E7973FEA4B5611DC","","","") FKs ( I found where the script is hiccuping.... The Uniprot release contains lines with identical annotation for the RL keyword for two different sequences. ___________________ First occurence... ___________________ ID 1433T_PONPY STANDARD; PRT; 245 AA. AC Q5RFJ2; Q5RDK2; DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2005, sequence version 2. DT 18-APR-2006, entry version 13. DE 14-3-3 protein theta. GN Name=YWHAQ; OS Pongo pygmaeus (Orangutan). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; OC Catarrhini; Hominidae; Pongo. OX NCBI_TaxID=9600; RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. RC TISSUE=Brain cortex, and Kidney; RG The German cDNA consortium; RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. <====== Not Unique ___________________ Second occurence... ___________________ ID 1433G_PONPY STANDARD; PRT; 246 AA. AC Q5RC20; DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2005, sequence version 2. DT 18-APR-2006, entry version 13. DE 14-3-3 protein gamma. GN Name=YWHAG; OS Pongo pygmaeus (Orangutan). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; OC Catarrhini; Hominidae; Pongo. OX NCBI_TaxID=9600; RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. RC TISSUE=Heart; RG The German cDNA consortium; RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. <====== Not Unique in these two cases the generated CRC key is identical and so MySQL throws a wobbly. if i look at the MySQL entry in the REFERENCE table for the first sequence ------+-------+---------+----------------------+ | 139 | NULL | Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC | +--------------+-----------+---------------------------------------------------- and the error when the script choked was MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.","CRC-E7973FEA4B5611DC","","","") FKs ( References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net> Message-ID: You found the right instance. Unfortunately with the way the bioperl swissprot parser works the group (RG) isn't promoted to author if there is no author in addition (in fact you may debate whether that would even be the best way of doing things), so it doesn't find it on second occurrence by unique key. If you can live without this entry, or any other entry that causes a hiccup, just supply the flag --safe and it will gracefully move on to the next entry. Fixing the issue would require either to fix the bioperl swissprot parser (or Bio::Annotation::Reference) to stick the RG group into the author slot if there is no author, or to fix Bioperl Bio::Annotation::Reference to also feature a group and biosql to use it in place of a missing author. Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql) should just use that in place of a missing author? The downside is that upon round-tripping an entry, the RG annotation line will become an RA annotation line. How bad would that be? Any thoughts from anyone? -hilmar On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote: > I found where the script is hiccuping.... > > The Uniprot release contains lines with identical annotation for > the RL keyword for two different sequences. > > ___________________ > > First occurence... > ___________________ > > ID 1433T_PONPY STANDARD; PRT; 245 AA. > AC Q5RFJ2; Q5RDK2; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein theta. > GN Name=YWHAQ; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Brain cortex, and Kidney; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > ___________________ > > Second occurence... > ___________________ > > > ID 1433G_PONPY STANDARD; PRT; 246 AA. > AC Q5RC20; > DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot. > DT 05-JUL-2005, sequence version 2. > DT 18-APR-2006, entry version 13. > DE 14-3-3 protein gamma. > GN Name=YWHAG; > OS Pongo pygmaeus (Orangutan). > OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; > OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; > OC Catarrhini; Hominidae; Pongo. > OX NCBI_TaxID=9600; > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA]. > RC TISSUE=Heart; > RG The German cDNA consortium; > RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. > <====== Not Unique > > > > in these two cases the generated CRC key is identical and so MySQL > throws a wobbly. > > if i look at the MySQL entry in the REFERENCE table for the first > sequence > ------+-------+---------+----------------------+ > | 139 | NULL | Submitted (NOV-2004) to the EMBL/ > GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC | > +--------------+----------- > +---------------------------------------------------- > > and the error when the script choked was > > MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, > values were > ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ > databases.","CRC-E7973FEA4B5611DC","","","") FKs ( Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3 > > hence the problem. > > I'm guessing i'm not the first person to encounter this, but dont > see any hints for an easy way around this. > > any suggestions....? > > ta > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From simon.rayner.cn at gmail.com Tue May 16 00:46:00 2006 From: simon.rayner.cn at gmail.com (simon rayner) Date: Tue, 16 May 2006 00:46:00 +0000 Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql In-Reply-To: 051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net Message-ID: <1147740360.3338.1.camel@biowiv.wivbio> thanks for the help. one way i was thinking of getting around it before i got your email about the --safe flag was to append an extra character to the offending string. So, in my case i would have "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases." and "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.2" which would presumably create different keys. However, i realise the downside of this is that i have now modified the data source... From hlapp at gmx.net Thu May 18 14:38:33 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 18 May 2006 10:38:33 -0400 Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql In-Reply-To: <1147740360.3338.1.camel@biowiv.wivbio> References: <1147740360.3338.1.camel@biowiv.wivbio> Message-ID: Yes, you have. Eventually I think this needs to be fixed by how the RG field is dealt with in either bioperl or bioperl-db. -hilmar On May 15, 2006, at 8:46 PM, simon rayner wrote: > thanks for the help. > > one way i was thinking of getting around it before i got your email > about the --safe flag was to append an extra character to the > offending > string. So, in my case i would have > > "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases." > > and > > "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.2" > > which would presumably create different keys. > > However, i realise the downside of this is that i have now > modified the data source... > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From darin.london at duke.edu Mon May 22 15:29:45 2006 From: darin.london at duke.edu (Darin London) Date: Mon, 22 May 2006 11:29:45 -0400 Subject: [BioSQL-l] BOSC 2006 2nd Call for Papers In-Reply-To: <4471CE49.80109@duke.edu> References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu> Message-ID: <4471D8E9.8090109@duke.edu> 2nd CALL FOR SPEAKERS This is the second and last official call for speakers to submit their abstracts to speak at BOSC 2006 in Fortaleza, Brasil. In order to be considered as a potential speaker, an abstract must be recieved by Monday, June 5th, 2006. We look forward to a great conference this year. Please consult The Official BOSC 2006 Website at: http://www.open-bio.org/wiki/BOSC_2006 for more details and information. In addition, a BOSC weblog has been setup to make it easier to desiminate all BOSC related announcements: http://wiki.open-bio.org/boscblog/ And if you have an ICAL compatible Calendar, there is an EventDB calendar set up with all BOSC related deadlines. http://eventful.com/groups/G0-001-000014747-0 More information about ISMB can be found at the Official ISMB 2006 Website: http://ismb2006.cbi.cnptia.embrapa.br/ Thank You, and we look forward to seeing you all, The BOSC Organizing Committee. From Gerben.Menschaert at UGent.be Tue May 23 16:02:12 2006 From: Gerben.Menschaert at UGent.be (Gerben Menschaert) Date: Tue, 23 May 2006 18:02:12 +0200 Subject: [BioSQL-l] load_seqdatabase.pl error due to bad DBSOURCE parsing Message-ID: <20060523160212.1BF9814A0D6F@tarzan.ugent.be> Hello all, I'm trying to load genbank accession number Q99ML8 as a genpept file with the load_seqdatabase.pl script. It fails on the DB_SOURCE part: -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were ("","UniGene:Mm.388865","0","") FKs () ORA-01400: cannot insert NULL into ("TEST_BIOSQL"."SG_DBXREF"."DBNAME") (DBD ERROR: error possibly near <*> indicator at char 57 in 'INSERT INTO dbxref (dbname, accession, version) VALUES (:<*>p1, :p2, :p3)') --------------------------------------------------- The DBSOURCE block from the genpept file looks like this: DBSOURCE swissprot: locus UCN2_MOUSE, accession Q99ML8; class: standard. created: May 10, 2002. sequence updated: Jun 1, 2001. annotation updated: May 16, 2006. xrefs: AF331517.1, AAK16157.1 xrefs (non-sequence databases): UniGene:Mm.388865, Ensembl:ENSMUSG00000049699, MGI:2176375, GO:0005576, GO:0001664, GO:0006171, GO:0007586, GO:0006950 How is this parsed? Could anybody point me into the good direction? Regards, Gerben From hlapp at gmx.net Tue May 23 17:03:30 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 23 May 2006 13:03:30 -0400 Subject: [BioSQL-l] load_seqdatabase.pl error due to bad DBSOURCE parsing In-Reply-To: <20060523160212.1BF9814A0D6F@tarzan.ugent.be> References: <20060523160212.1BF9814A0D6F@tarzan.ugent.be> Message-ID: This is in reality a Uniprot entry. The Genbank parser apparently doesn't succeed in picking apart accession and namespace prefix (dbname). If at all possible I'd recommend loading Uniprot in Uniprot (swissprot) format. Would that work for you? -hilmar On May 23, 2006, at 12:02 PM, Gerben Menschaert wrote: > Hello all, > > I'm trying to load genbank accession number Q99ML8 as a genpept > file with > the load_seqdatabase.pl script. It fails on the DB_SOURCE part: > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, > values were > ("","UniGene:Mm.388865","0","") FKs () > ORA-01400: cannot insert NULL into > ("TEST_BIOSQL"."SG_DBXREF"."DBNAME") (DBD > ERROR: error possibly near <*> indicator at char 57 in 'INSERT INTO > dbxref > (dbname, accession, version) VALUES (:<*>p1, :p2, :p3)') > --------------------------------------------------- > > The DBSOURCE block from the genpept file looks like this: > > DBSOURCE swissprot: locus UCN2_MOUSE, accession Q99ML8; > class: standard. > created: May 10, 2002. > sequence updated: Jun 1, 2001. > annotation updated: May 16, 2006. > xrefs: AF331517.1, AAK16157.1 > xrefs (non-sequence databases): UniGene:Mm.388865, > Ensembl:ENSMUSG00000049699, MGI:2176375, GO:0005576, GO: > 0001664, > GO:0006171, GO:0007586, GO:0006950 > > How is this parsed? Could anybody point me into the good direction? > > Regards, > Gerben > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gad14 at cornell.edu Tue May 23 19:38:31 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Tue, 23 May 2006 15:38:31 -0400 Subject: [BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with biosql In-Reply-To: References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> Message-ID: <447364B7.4030203@cornell.edu> Hi Hilmar, I apologize in advance if I'm talking about something that is well documented somewhere but I'm still having trouble understanding exactly what I need to do to get a biosql database loaded in such a way so that it can interact fully with gbrowse - it seems to be half way there. I use load_seqdatabase.pl to load the genome sequence (single sequence in fasta format) into a biosql database but I populate the db with features using what I thought was a GFF-centric approach, not with load_seqdatabase.pl -- see my code in #4 below. Here is exactly what I do: 1) Create a mysql database called 'test_biosql' with correct permissions 2) Load the biosql schema: mysql --user=xx --password=xx test_biosql < /usr/local/biosql-schema/sql/biosqldb-mysql.sql 3) Use load_seqdatabase.pl to load the single genomic dna sequence: load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 -namespace NC_004578 -format fasta 6853.fasta 4) I then use a script I wrote to load the SeqFeatures which are in gff format in a file i pass in as as arg ($in). Here is the code: # read gff file into gff io object my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3); # create a Bio::DB::DBAdaptorI implementing object my $db = Bio::DB::BioDB->new(-database => $dbname, -port => $port, -dbname => $database, -driver => $driver, -user => $user, -pass => $pass, ); # get appropriate object adaptor my $adp = $db->get_object_adaptor("Bio::SeqI"); my $acc = "NC_004578"; # the genome seq id already in the db my $seq = Bio::Seq->new(-accession_number => $acc, -display_id => $acc, -primary_id => $acc, -namespace => $acc); # Locate entry matching the unique key attributes and populate a # persistent object with this entry. my $dbseq = $adp->find_by_unique_key($seq); # insert features from gff file into database. while (my $feat = $gffio->next_feature()) { $dbseq->add_SeqFeature($feat); $dbseq->store; $dbseq->commit(); } Is there additional code I should have here? I realize you're not a expert/user of gbrowse.. and this problem seems to be related to the gbrowse_details cgi script, which you probably are not familiar with. But I'm CC'ing the lists in case anyone else has some clues. I do appreciate any insight you might have though. It would be good to know if I'm doing all that I need to do to fully and correctly populate a biosql db with GFF/SeqFeature. Thanks, Genevieve Hilmar Lapp wrote: > Hi Genevieve, > > there's a couple more regular users of BioSQL than one (about 25-30 > groups), but not many who run GBrowse off of BioSQL (and I don't count > among those - yet). > > Of those who have posted before that they accomplished this, I believe > none were using load_seqdatabase.pl to load the data. Instead, they > loaded data through the DBGFF adaptor for BioSQL, i.e., like you would > load data into a GFF database, just using a different adaptor. > > load_seqdatabase.pl will load through the sequence-centric Bioperl > object model, and has no notion of GFF or GFF3 and associated > constraints (controlled vocabulary for feature type and source terms, > location types etc). > > It is probably possible to load data through load_seqdatabase.pl and > then render it through GBrowse but doing so will almost certainly > require a SeqProcessor (see --pipeline argument to load_seqdatabase.pl) > to be written that will appropriately unflatten the feature array and > in fact probably have to use SeqFeature::Annotated (where actually did > SeqFeature::TypedSeqFeature go?). In parallel, bioperl-db will need to > be fixed to be prepared for SeqFeatureI implementations that use > ontology terms for primary_tag and source_tag instead of strings. > > Is it possible for you to load your data through a GFF3 intermediary? > Bioperl has modules and in fact scripts that will write GFF3 (if I'm > not mistaken ...). > > -hilmar > > On May 18, 2006, at 5:58 PM, Lincoln Stein wrote: > >> Hi Genevieve, >> >> The problem is that none of us really knows anything about BioSQL. >> Hilmar is >> the only regular user of this database. He's now gone to NESCent (duke >> university) and may not be receiving mail sent to GNF. >> >> Lincoln >> >> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote: >> >>> Hi Scott, >>> >>> I'm still having the same problem. It might have to do with how the >>> BioSQL database is populated. I use the load_seqdatabase.pl script to >>> load the database along with bioperl-db functions for loading >>> SeqFeatures directly. I took a closer look at how the tables are >>> populated in the biosql tables. (If you're not familiar with BioSQL the >>> following may not be familiar to you -- i just want to put this >>> observation out there...). I noticed that the 'term_id' field in the >>> Location table was empty for the first gene record i had loaded. When I >>> set term_id to be '11', the id that corresponds with the 'gene' >>> ontology >>> term, i notice a positive change in what's displayed on the >>> gbrowse_details page for this record... the name of the gene 'dnaA' now >>> appears in the title line in large blue font, as it should. The class >>> name is still missing, as does all the detail about this gene - >>> coordinates, etc. >>> >>> Lincoln suggests that I talk directly Hilmar Lapp who is the main >>> BioSQL >>> developer. It could be that I am bumping up against things that haven't >>> been developed yet as far as the GBrowse<->BioSQL db connectivity goes. >>> I've been taking a closer look at gbrowse_details.pl, Browser.pm and >>> Util.pm in order to try to understand where the disconnect might be... >>> >>> To answer your question below.. yes GBrowse works fine for the >>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table database. >>> I'm >>> using this installation of GBrowse 1.64 for several MySQL databases >>> with >>> the default gbrowse tables... everything is working fine. My only >>> trouble with gbrowse crops up when interfacing with the biosql mysql >>> db. >>> >>> Thanks for all your help, >>> Genevieve >>> >>> Scott Cain wrote: >>> >>>> Hi Genevieve, >>>> >>>> I'm sorry this has hung out there unanswered for so long. I >>>> suppose it >>>> was because I chose not to answer it because it involved BioSQL >>>> (which I >>>> know just about nothing about) and Simon seemed to think that the >>>> MySQL >>>> adaptor was involved somehow (though it doesn't look to me like it >>>> is). >>>> >>>> Anyway, I'll try to get started answering your questions (assuming you >>>> haven't already puzzled you way to one already). See my comments >>>> below. >>>> >>>> Scott >>>> >>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote: >>>> >>>>> Hi, >>>>> I'm running gbrowse 1.64 with a biosql database on a mac with bioperl >>>>> 1.5.1. I successfully loaded the database with load_seqdatabase.pl >>>>> with >>>>> NC_004578.gbk from NCBI. >>>>> >>>>> The features display as they should on the main gbrowse details pane. >>>>> However, when I click on one of the features I get GBrowse Details >>>>> data >>>>> record page with ":Details" at the top in large blue font but no data >>>>> for that gene display. In smaller red font, "Requested feature not >>>>> found >>>>> in database" which is followed by the normal details page footer info >>>>> ("For the source code for this browser, see...", etc). >>>>> >>>>> I'm using the 06.biosql.conf file - with appropriate additions in >>>>> db_args for my database. I changed 'link' to >>>>> >>>>> link = AUTO >>>>> >>>>> from what was there >>>>> >>>>> link = >>>>> http://localhost/perl/gbrowse?ref=$ref;start=$start;stop= $end >>>>> >>>>> The default suggestion for 'link' is a little confusing.. why does it >>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why isn't >>>>> 'cgi-bin' in the path? >>>> >>>> >>>> I'm not sure why the suggested link is the way it is; perhaps that >>>> config file predates the gbrowse_details script and no one changed >>>> this >>>> sample config file. I changed it and it will be changed in the next >>>> release. >>>> >>>> As for the path, 'perl' is a common url convention for scripts that >>>> are >>>> running under mod_perl, so I suspect the person who wrote this sample >>>> config file was running mod_perl. >>>> >>>>> When i set link to 'AUTO' I at least get the details page. >>>>> gbrowse_details is not getting what it needs to disaply the record >>>>> info >>>>> though. The webserver error I get is: >>>>> >>>>> Subroutine Bio::SeqFeature::Generic::type redefined at >>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/ BioSQL.pm >>>>> line 126. >>>> >>>> >>>> I'm not sure this is really the problem. Let me make sure: it is >>>> after >>>> you changed link to AUTO that you see this? That is, the page you see >>>> now is as you described in your second paragraph, right? >>>> Unfortunately, >>>> this is the part where I become particularly useless, since I don't >>>> know >>>> anything about BioSQL. Is the details page working OK for the >>>> yeast_chr1 dataset? >>>> >>>> Scott >>>> >>>>> I took a look at line 126 in BioSQL.pm - not sure what to make of it. >>>>> >>>>> Any ideas? Am I overlooking anything? >>>>> >>>>> Thanks, >>>>> Genevieve >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------- >>>>> Using Tomcat but need to do more? Need to support web services, >>>>> security? >>>>> Get stuff done quickly with pre-integrated technology to make your >>>>> job >>>>> easier Download IBM WebSphere Application Server v.1.0.1 based on >>>>> Apache >>>>> Geronimo >>>>> http://sel.as-us.falkag.net/sel? >>>>> cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>> _______________________________________________ >>>>> Gmod-gbrowse mailing list >>>>> Gmod-gbrowse at lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> >> >> -- >> Lincoln D. Stein >> Cold Spring Harbor Laboratory >> 1 Bungtown Road >> Cold Spring Harbor, NY 11724 >> (516) 367-8380 (voice) >> (516) 367-8389 (fax) >> FOR URGENT MESSAGES & SCHEDULING, >> PLEASE CONTACT MY ASSISTANT, >> SANDRA MICHELSEN, AT michelse at cshl.edu >> >> >> ------------------------------------------------------- >> Using Tomcat but need to do more? Need to support web services, >> security? >> Get stuff done quickly with pre-integrated technology to make your >> job easier >> Download IBM WebSphere Application Server v.1.0.1 based on Apache >> Geronimo >> http://sel.as-us.falkag.net/sel? cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> > From hlapp at gmx.net Wed May 24 03:50:21 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 24 May 2006 04:50:21 +0100 Subject: [BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with biosql In-Reply-To: <447364B7.4030203@cornell.edu> References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> <447364B7.4030203@cornell.edu> Message-ID: Hi Genevieve & Scott, see below for interspersed comments. On May 23, 2006, at 3:38 PM, Genevieve DeClerck wrote: > Hi Hilmar, > > I apologize in advance if I'm talking about something that is well > documented somewhere > but I'm still having trouble understanding exactly what I need to > do to get a biosql database loaded in such a way so that it can > interact fully with gbrowse - it seems to be half way there. > > I use load_seqdatabase.pl to load the genome sequence (single > sequence in fasta format) into a biosql database but I populate the > db with features using what I thought was a GFF-centric approach, > not with load_seqdatabase.pl -- see my code in #4 below. > > Here is exactly what I do: > > [...] > 3) Use load_seqdatabase.pl to load the single genomic dna sequence: > > load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 - > namespace NC_004578 -format fasta 6853.fasta > > 4) I then use a script I wrote to load the SeqFeatures which are in > gff format in a file i pass in as as arg ($in). Here is the code: > > > # read gff file into gff io object > my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3); > > # create a Bio::DB::DBAdaptorI implementing object > my $db = Bio::DB::BioDB->new(-database => $dbname, > -port => $port, > -dbname => $database, > -driver => $driver, > -user => $user, > -pass => $pass, > ); > > # get appropriate object adaptor > my $adp = $db->get_object_adaptor("Bio::SeqI"); > > my $acc = "NC_004578"; # the genome seq id already in the db > my $seq = Bio::Seq->new(-accession_number => $acc, > -display_id => $acc, > -primary_id => $acc, > -namespace => $acc); > > # Locate entry matching the unique key attributes and populate a > # persistent object with this entry. > my $dbseq = $adp->find_by_unique_key($seq); > > # insert features from gff file into database. > while (my $feat = $gffio->next_feature()) { > $dbseq->add_SeqFeature($feat); > $dbseq->store; > $dbseq->commit(); > } > > > Is there additional code I should have here? I realize you're not a > expert/user of gbrowse.. and this problem seems to be related to > the gbrowse_details cgi script, which you probably are not familiar > with. So your use case is that you have a sequence in simple fasta format with its annotation in another file in GFF3 format, and you want to load both into a Biosql database and visualize in GBrowse. It looks like I was in fact on the wrong path the whole time. The Gbrowse Biosql adaptor that I can find is a Bio::DasI adaptor through which you cannot load but only retrieve, so I have to assume that you were right in using load_seqdatabase.pl. Can somebody help out here who has been using Biosql as the underlying database and confirm or set me straight? If that is the procedure then your code looks alright. Also, it looks like Bio::Tools::GFF does not return hierarchical feature graphs for v3 input (which bioperl-db wouldn't handle properly because it doesn't support the feature_relationship table yet). So, I'm in fact at a loss explaining why the details page doesn't work for you, given that people reported it to work before. I'm inclined to claim that the respective Gbrowse code has changed, either in the way it expects the feature to be set up, or in the way it uses the DasI interface, and broke the Biosql adaptor. Can somebody (Scott? Lincoln?) comment on whether there were any changes in this regard? The lines in gbrowse_detail that look like lead to the problem is my @features = sort {$b->length<=>$a->length} $CONFIG->_feature_get ($db,$name,$class); @features = sort {$b->length<=>$a->length} $CONFIG->_feature_get ($db,$ref,$class,$start,$end,1) unless @features; neither of which returns any matches. I have no clue yet how those two calls get translated into DasI to bioperl-db queries. -hilmar > But I'm CC'ing the lists in case anyone else has some clues. I do > appreciate any insight you might have though. It would be good to > know if I'm doing all that I need to do to fully and correctly > populate a biosql db with GFF/SeqFeature. > > Thanks, > Genevieve > > > > Hilmar Lapp wrote: >> Hi Genevieve, >> there's a couple more regular users of BioSQL than one (about >> 25-30 groups), but not many who run GBrowse off of BioSQL (and I >> don't count among those - yet). >> Of those who have posted before that they accomplished this, I >> believe none were using load_seqdatabase.pl to load the data. >> Instead, they loaded data through the DBGFF adaptor for BioSQL, >> i.e., like you would load data into a GFF database, just using a >> different adaptor. >> load_seqdatabase.pl will load through the sequence-centric >> Bioperl object model, and has no notion of GFF or GFF3 and >> associated constraints (controlled vocabulary for feature type >> and source terms, location types etc). >> It is probably possible to load data through load_seqdatabase.pl >> and then render it through GBrowse but doing so will almost >> certainly require a SeqProcessor (see --pipeline argument to >> load_seqdatabase.pl) to be written that will appropriately >> unflatten the feature array and in fact probably have to use >> SeqFeature::Annotated (where actually did >> SeqFeature::TypedSeqFeature go?). In parallel, bioperl-db will >> need to be fixed to be prepared for SeqFeatureI implementations >> that use ontology terms for primary_tag and source_tag instead of >> strings. >> Is it possible for you to load your data through a GFF3 >> intermediary? Bioperl has modules and in fact scripts that will >> write GFF3 (if I'm not mistaken ...). >> -hilmar >> On May 18, 2006, at 5:58 PM, Lincoln Stein wrote: >>> Hi Genevieve, >>> >>> The problem is that none of us really knows anything about >>> BioSQL. Hilmar is >>> the only regular user of this database. He's now gone to NESCent >>> (duke >>> university) and may not be receiving mail sent to GNF. >>> >>> Lincoln >>> >>> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote: >>> >>>> Hi Scott, >>>> >>>> I'm still having the same problem. It might have to do with how the >>>> BioSQL database is populated. I use the load_seqdatabase.pl >>>> script to >>>> load the database along with bioperl-db functions for loading >>>> SeqFeatures directly. I took a closer look at how the tables are >>>> populated in the biosql tables. (If you're not familiar with >>>> BioSQL the >>>> following may not be familiar to you -- i just want to put this >>>> observation out there...). I noticed that the 'term_id' field in >>>> the >>>> Location table was empty for the first gene record i had >>>> loaded. When I >>>> set term_id to be '11', the id that corresponds with the 'gene' >>>> ontology >>>> term, i notice a positive change in what's displayed on the >>>> gbrowse_details page for this record... the name of the gene >>>> 'dnaA' now >>>> appears in the title line in large blue font, as it should. The >>>> class >>>> name is still missing, as does all the detail about this gene - >>>> coordinates, etc. >>>> >>>> Lincoln suggests that I talk directly Hilmar Lapp who is the >>>> main BioSQL >>>> developer. It could be that I am bumping up against things that >>>> haven't >>>> been developed yet as far as the GBrowse<->BioSQL db >>>> connectivity goes. >>>> I've been taking a closer look at gbrowse_details.pl, Browser.pm >>>> and >>>> Util.pm in order to try to understand where the disconnect >>>> might be... >>>> >>>> To answer your question below.. yes GBrowse works fine for the >>>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table >>>> database. I'm >>>> using this installation of GBrowse 1.64 for several MySQL >>>> databases with >>>> the default gbrowse tables... everything is working fine. My only >>>> trouble with gbrowse crops up when interfacing with the biosql >>>> mysql db. >>>> >>>> Thanks for all your help, >>>> Genevieve >>>> >>>> Scott Cain wrote: >>>> >>>>> Hi Genevieve, >>>>> >>>>> I'm sorry this has hung out there unanswered for so long. I >>>>> suppose it >>>>> was because I chose not to answer it because it involved >>>>> BioSQL (which I >>>>> know just about nothing about) and Simon seemed to think that >>>>> the MySQL >>>>> adaptor was involved somehow (though it doesn't look to me >>>>> like it is). >>>>> >>>>> Anyway, I'll try to get started answering your questions >>>>> (assuming you >>>>> haven't already puzzled you way to one already). See my >>>>> comments below. >>>>> >>>>> Scott >>>>> >>>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote: >>>>> >>>>>> Hi, >>>>>> I'm running gbrowse 1.64 with a biosql database on a mac with >>>>>> bioperl >>>>>> 1.5.1. I successfully loaded the database with >>>>>> load_seqdatabase.pl with >>>>>> NC_004578.gbk from NCBI. >>>>>> >>>>>> The features display as they should on the main gbrowse >>>>>> details pane. >>>>>> However, when I click on one of the features I get GBrowse >>>>>> Details data >>>>>> record page with ":Details" at the top in large blue font but >>>>>> no data >>>>>> for that gene display. In smaller red font, "Requested >>>>>> feature not found >>>>>> in database" which is followed by the normal details page >>>>>> footer info >>>>>> ("For the source code for this browser, see...", etc). >>>>>> >>>>>> I'm using the 06.biosql.conf file - with appropriate additions in >>>>>> db_args for my database. I changed 'link' to >>>>>> >>>>>> link = AUTO >>>>>> >>>>>> from what was there >>>>>> >>>>>> link = http://localhost/perl/gbrowse?ref=$ref;start= >>>>>> $start;stop= $end >>>>>> >>>>>> The default suggestion for 'link' is a little confusing.. why >>>>>> does it >>>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why >>>>>> isn't >>>>>> 'cgi-bin' in the path? >>>>> >>>>> >>>>> I'm not sure why the suggested link is the way it is; perhaps that >>>>> config file predates the gbrowse_details script and no one >>>>> changed this >>>>> sample config file. I changed it and it will be changed in the >>>>> next >>>>> release. >>>>> >>>>> As for the path, 'perl' is a common url convention for scripts >>>>> that are >>>>> running under mod_perl, so I suspect the person who wrote this >>>>> sample >>>>> config file was running mod_perl. >>>>> >>>>>> When i set link to 'AUTO' I at least get the details page. >>>>>> gbrowse_details is not getting what it needs to disaply the >>>>>> record info >>>>>> though. The webserver error I get is: >>>>>> >>>>>> Subroutine Bio::SeqFeature::Generic::type redefined at >>>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/ >>>>>> BioSQL.pm >>>>>> line 126. >>>>> >>>>> >>>>> I'm not sure this is really the problem. Let me make sure: it >>>>> is after >>>>> you changed link to AUTO that you see this? That is, the page >>>>> you see >>>>> now is as you described in your second paragraph, right? >>>>> Unfortunately, >>>>> this is the part where I become particularly useless, since I >>>>> don't know >>>>> anything about BioSQL. Is the details page working OK for the >>>>> yeast_chr1 dataset? >>>>> >>>>> Scott >>>>> >>>>>> I took a look at line 126 in BioSQL.pm - not sure what to >>>>>> make of it. >>>>>> >>>>>> Any ideas? Am I overlooking anything? >>>>>> >>>>>> Thanks, >>>>>> Genevieve >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------- >>>>>> Using Tomcat but need to do more? Need to support web >>>>>> services, security? >>>>>> Get stuff done quickly with pre-integrated technology to make >>>>>> your job >>>>>> easier Download IBM WebSphere Application Server v.1.0.1 >>>>>> based on Apache >>>>>> Geronimo >>>>>> http://sel.as-us.falkag.net/sel? >>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>>> _______________________________________________ >>>>>> Gmod-gbrowse mailing list >>>>>> Gmod-gbrowse at lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> >>> >>> -- >>> Lincoln D. Stein >>> Cold Spring Harbor Laboratory >>> 1 Bungtown Road >>> Cold Spring Harbor, NY 11724 >>> (516) 367-8380 (voice) >>> (516) 367-8389 (fax) >>> FOR URGENT MESSAGES & SCHEDULING, >>> PLEASE CONTACT MY ASSISTANT, >>> SANDRA MICHELSEN, AT michelse at cshl.edu >>> >>> >>> ------------------------------------------------------- >>> Using Tomcat but need to do more? Need to support web services, >>> security? >>> Get stuff done quickly with pre-integrated technology to make >>> your job easier >>> Download IBM WebSphere Application Server v.1.0.1 based on >>> Apache Geronimo >>> http://sel.as-us.falkag.net/sel? >>> cmd=lnk&kid=120709&bid=263057&dat=121642 >>> _______________________________________________ >>> Gmod-gbrowse mailing list >>> Gmod-gbrowse at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >>> > > > > ------------------------------------------------------- > All the advantages of Linux Managed Hosting--Without the Cost and > Risk! > Fully trained technicians. The highest number of Red Hat > certifications in > the hosting industry. Fanatical Support. Click to learn more > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=107521&bid=248729&dat=121642 > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mjcipriano at lbl.gov Tue May 30 22:26:24 2006 From: mjcipriano at lbl.gov (Michael Cipriano) Date: Tue, 30 May 2006 15:26:24 -0700 Subject: [BioSQL-l] Problem with add feature under BioSQL In-Reply-To: References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> <447364B7.4030203@cornell.edu> Message-ID: <1149027984.3139.105.camel@alien> Hello, I have found a problem with adding features via gbrowse_img with the add=xxx tag. I am using the CVS version of GGB, bioperl-live and BioSQL schema on mysql. When using the add=xxx tag, it will produce a fatal error (with BioSQL/Das interface). There error shown in the error log is: link: /cgi-bin/gbrowse_img/bacteria/?name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; ERROR from apache error_log: [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't locate object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" at /var/www/cgi-bin/gbrowse_img line 502. [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Premature end of script headers: gbrowse_img This is from this section of code at line ~506: unless ($segments{$refname}) { my @segments = map { eval{$_->absolute(0)}; $_ # so that rel2abs works properly later } grep { $current_segment->overlaps($_) } get_segments($db, $refname); return unless @segments; $segments{$refname} = $segments[0]; } The overlaps function is not defined in the Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. The fix was the inclusion of Bio::RangeI in the @ISA variable (shown below) in the file Bio/DB/Das/BioSQL/Segment.pm #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); I am not sure if this will have any other consequences other then fixing the bug I mentioned (and possibly fixing something else). Can anyone tell me if this will introduce any new bugs, and if not, can someone commit this change. Thanks, Michael Cipriano Developer - LBNL From hlapp at gmx.net Wed May 31 18:33:40 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 31 May 2006 14:33:40 -0400 Subject: [BioSQL-l] Problem with add feature under BioSQL In-Reply-To: <1149027984.3139.105.camel@alien> References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> <447364B7.4030203@cornell.edu> <1149027984.3139.105.camel@alien> Message-ID: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> This should be a Gbrowse problem, not in Biosql or bioperl-db unless I'm missing something? Just trying to make sure ... -hilmar On May 30, 2006, at 6:26 PM, Michael Cipriano wrote: > Hello, > > I have found a problem with adding features via gbrowse_img with the > add=xxx tag. I am using the CVS version of GGB, bioperl-live and > BioSQL > schema on mysql. > > When using the add=xxx tag, it will produce a fatal error (with > BioSQL/Das interface). There error shown in the error log is: > link: > /cgi-bin/gbrowse_img/bacteria/? > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; > > ERROR from apache error_log: > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't > locate > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" > at /var/www/cgi-bin/gbrowse_img line 502. > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] > Premature end > of script headers: gbrowse_img > > This is from this section of code at line ~506: > > unless ($segments{$refname}) { > my @segments = map { > eval{$_->absolute(0)}; $_ # so that rel2abs works properly > later > } > grep { $current_segment->overlaps($_) } get_segments($db, > $refname); > return unless @segments; > $segments{$refname} = $segments[0]; > } > > > The overlaps function is not defined in the > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. > > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown > below) in the file Bio/DB/Das/BioSQL/Segment.pm > > > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); > > > I am not sure if this will have any other consequences other then > fixing > the bug I mentioned (and possibly fixing something else). > > Can anyone tell me if this will introduce any new bugs, and if not, > can > someone commit this change. > > Thanks, > Michael Cipriano > Developer - LBNL > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mjcipriano at lbl.gov Wed May 31 18:50:53 2006 From: mjcipriano at lbl.gov (Michael Cipriano) Date: Wed, 31 May 2006 11:50:53 -0700 Subject: [BioSQL-l] Problem with add feature under BioSQL In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> References: <4447A8D5.4@cornell.edu> <1147896787.2600.47.camel@localhost.localdomain> <446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu> <447364B7.4030203@cornell.edu> <1149027984.3139.105.camel@alien> <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> Message-ID: <1149101453.3139.112.camel@alien> Hi, Yes, I only see the problem with gbrowse, though it could come up anytime someone wants connect with Das using Bio::DB::Das::BioSQL and needs the overlap function (or other range functions) in the Segment object. -Michael On Wed, 2006-05-31 at 14:33 -0400, Hilmar Lapp wrote: > This should be a Gbrowse problem, not in Biosql or bioperl-db unless > I'm missing something? Just trying to make sure ... > > -hilmar > > On May 30, 2006, at 6:26 PM, Michael Cipriano wrote: > > > Hello, > > > > I have found a problem with adding features via gbrowse_img with the > > add=xxx tag. I am using the CVS version of GGB, bioperl-live and > > BioSQL > > schema on mysql. > > > > When using the add=xxx tag, it will produce a fatal error (with > > BioSQL/Das interface). There error shown in the error log is: > > link: > > /cgi-bin/gbrowse_img/bacteria/? > > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; > > > > ERROR from apache error_log: > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't > > locate > > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" > > at /var/www/cgi-bin/gbrowse_img line 502. > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] > > Premature end > > of script headers: gbrowse_img > > > > This is from this section of code at line ~506: > > > > unless ($segments{$refname}) { > > my @segments = map { > > eval{$_->absolute(0)}; $_ # so that rel2abs works properly > > later > > } > > grep { $current_segment->overlaps($_) } get_segments($db, > > $refname); > > return unless @segments; > > $segments{$refname} = $segments[0]; > > } > > > > > > The overlaps function is not defined in the > > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. > > > > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown > > below) in the file Bio/DB/Das/BioSQL/Segment.pm > > > > > > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN > > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); > > > > > > I am not sure if this will have any other consequences other then > > fixing > > the bug I mentioned (and possibly fixing something else). > > > > Can anyone tell me if this will introduce any new bugs, and if not, > > can > > someone commit this change. > > > > Thanks, > > Michael Cipriano > > Developer - LBNL > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l > > > From lstein at cshl.edu Wed May 31 18:47:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 14:47:47 -0400 Subject: [BioSQL-l] [Gmod-gbrowse] Re: Problem with add feature under BioSQL In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> References: <4447A8D5.4@cornell.edu> <1149027984.3139.105.camel@alien> <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> Message-ID: <200605311447.49454.lstein@cshl.edu> I think this is a problem in the Bio::DB::Das::BioSQL::Segment module. I will add an overlaps() method. Lincoln On Wednesday 31 May 2006 14:33, Hilmar Lapp wrote: > This should be a Gbrowse problem, not in Biosql or bioperl-db unless > I'm missing something? Just trying to make sure ... > > -hilmar > > On May 30, 2006, at 6:26 PM, Michael Cipriano wrote: > > Hello, > > > > I have found a problem with adding features via gbrowse_img with the > > add=xxx tag. I am using the CVS version of GGB, bioperl-live and > > BioSQL > > schema on mysql. > > > > When using the add=xxx tag, it will produce a fatal error (with > > BioSQL/Das interface). There error shown in the error log is: > > link: > > /cgi-bin/gbrowse_img/bacteria/? > > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; > > > > ERROR from apache error_log: > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't > > locate > > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" > > at /var/www/cgi-bin/gbrowse_img line 502. > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] > > Premature end > > of script headers: gbrowse_img > > > > This is from this section of code at line ~506: > > > > unless ($segments{$refname}) { > > my @segments = map { > > eval{$_->absolute(0)}; $_ # so that rel2abs works properly > > later > > } > > grep { $current_segment->overlaps($_) } get_segments($db, > > $refname); > > return unless @segments; > > $segments{$refname} = $segments[0]; > > } > > > > > > The overlaps function is not defined in the > > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. > > > > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown > > below) in the file Bio/DB/Das/BioSQL/Segment.pm > > > > > > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN > > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); > > > > > > I am not sure if this will have any other consequences other then > > fixing > > the bug I mentioned (and possibly fixing something else). > > > > Can anyone tell me if this will introduce any new bugs, and if not, > > can > > someone commit this change. > > > > Thanks, > > Michael Cipriano > > Developer - LBNL > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Wed May 31 18:56:30 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 31 May 2006 14:56:30 -0400 Subject: [BioSQL-l] [Gmod-gbrowse] Re: Problem with add feature under BioSQL In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> References: <4447A8D5.4@cornell.edu> <1149027984.3139.105.camel@alien> <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net> Message-ID: <200605311456.32088.lstein@cshl.edu> I've just committed the fix into CVS. This will be available in the upcoming gbrowse release as well. Lincoln On Wednesday 31 May 2006 14:33, Hilmar Lapp wrote: > This should be a Gbrowse problem, not in Biosql or bioperl-db unless > I'm missing something? Just trying to make sure ... > > -hilmar > > On May 30, 2006, at 6:26 PM, Michael Cipriano wrote: > > Hello, > > > > I have found a problem with adding features via gbrowse_img with the > > add=xxx tag. I am using the CVS version of GGB, bioperl-live and > > BioSQL > > schema on mysql. > > > > When using the add=xxx tag, it will produce a fatal error (with > > BioSQL/Das interface). There error shown in the error log is: > > link: > > /cgi-bin/gbrowse_img/bacteria/? > > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999; > > > > ERROR from apache error_log: > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't > > locate > > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment" > > at /var/www/cgi-bin/gbrowse_img line 502. > > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] > > Premature end > > of script headers: gbrowse_img > > > > This is from this section of code at line ~506: > > > > unless ($segments{$refname}) { > > my @segments = map { > > eval{$_->absolute(0)}; $_ # so that rel2abs works properly > > later > > } > > grep { $current_segment->overlaps($_) } get_segments($db, > > $refname); > > return unless @segments; > > $segments{$refname} = $segments[0]; > > } > > > > > > The overlaps function is not defined in the > > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits. > > > > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown > > below) in the file Bio/DB/Das/BioSQL/Segment.pm > > > > > > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN > > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI); > > > > > > I am not sure if this will have any other consequences other then > > fixing > > the bug I mentioned (and possibly fixing something else). > > > > Can anyone tell me if this will introduce any new bugs, and if not, > > can > > someone commit this change. > > > > Thanks, > > Michael Cipriano > > Developer - LBNL > > > > > > _______________________________________________ > > BioSQL-l mailing list > > BioSQL-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biosql-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From d49228002 at ym.edu.tw Wed May 31 14:39:16 2006 From: d49228002 at ym.edu.tw (Yi-Feng Chang) Date: Wed, 31 May 2006 22:39:16 +0800 Subject: [BioSQL-l] Error loading ontology terms Message-ID: <000001c68c97$f461ac70$6801a8c0@iannb> Dear All, I've checked biosql archives, and found a similar thread (http://lists.open-bio.org/pipermail/biojava-l/2005-November/005151.html) however, it did not give specific solution. So I post here again, and hope there are someone could help me. I'm using JDK1.5.0_05, Biojava 1.4, Biosql 1.41, and Mysql 5.0 with My_connectJ 3.1 I was following the demo source that provide by biojava-in-anger except for the database connection the exceptions were listed in following: In first connection there would be a connection error *** Importing a core ontology -- hope this is okay *** Importing terms Exception in thread "main" org.biojava.bio.BioException: Error connecting to BioSQL database: Connection is closed. at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:276) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.(BioSQLSequenceDB.java:194) at genevote.BioSQLTest.loadSeq(BioSQLTest.java:31) at genevote.BioSQLTest.main(BioSQLTest.java:70) Caused by: java.sql.SQLException: Connection is closed. at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.checkOpen(PoolingDataSource.java:219) at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.createStatement(PoolingDataSource.java:248) at org.biojava.bio.seq.db.biosql.MySQLDBHelper.getInsertID(MySQLDBHelper.java:68) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:268) ... 3 more Then I tried again, it works, and I put all sequences in genbank format into biosql db without error. But, while I tried to extract sequences, exception comes again. org.biojava.bio.BioException: Error loading ontology terms at org.biojava.bio.seq.db.biosql.OntologySQL.loadOntology(OntologySQL.java:444) at org.biojava.bio.seq.db.biosql.OntologySQL.getOntology(OntologySQL.java:116) at org.biojava.bio.seq.db.biosql.OntologySQL.(OntologySQL.java:413) at org.biojava.bio.seq.db.biosql.OntologySQL.getOntologySQL(OntologySQL.java:72) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:240) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.(BioSQLSequenceDB.java:194) at genevote.test.loadSeq(test.java:25) at genevote.test.main(test.java:76) Caused by: java.sql.SQLException: Unknown column 'name' in 'field list' at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2851) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1534) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1625) at com.mysql.jdbc.Connection.execSQL(Connection.java:2297) at com.mysql.jdbc.Connection.execSQL(Connection.java:2226) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1812) at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1657) at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) at org.biojava.bio.seq.db.biosql.OntologySQL.loadTerms(OntologySQL.java:339) at org.biojava.bio.seq.db.biosql.OntologySQL.loadOntology(OntologySQL.java:441) ... 7 more