From s.rayner at att.net  Mon May 15 02:13:06 2006
From: s.rayner at att.net (s.rayner at att.net)
Date: Mon, 15 May 2006 06:13:06 +0000
Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql
Message-ID: <051520060613.9311.44681BF0000A2A7A0000245F21604666489D0A02970E9DD29C@att.net>

 Hello,
 
 I have been trying to upload the current release of uniprot (version 49.6) into 
 MySQL using the most current version of load_seqdatabase.pl from CVS  
 
 (# $Id: load_seqdatabase.pl,v 1.24 2006/01/19 21:34:29 lapp Exp $)
 
 I have tested the script on subsets of uniprot and it loads without problem, but 
 when i attempt to load the full dataset, i end up with the follow error....
 
 
 biowiv:/usr/lib/perl5/bioperl-db/scripts/biosql # perl load_seqdatabase.pl 
 --dbname uniprot --dbuser XXXX --dbpass XXXX --format swiss 
 /var/downloads/sequence/uniprot_sprot.dat
 Loading /var/downloads/sequence/uniprot_sprot.dat ...
 
 -------------------- WARNING ---------------------
 MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were 
 ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ 
 databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
 Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
 ---------------------------------------------------
 Could not store Q5RFJ2:
 ------------- EXCEPTION  -------------
 MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found 
 by unique key
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:208
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
 STACK Bio::DB::Persistent::PersistentObject::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/Persistent/PersistentObject.pm:272
 STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:219
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
 STACK Bio::DB::Persistent::PersistentObject::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/Persistent/PersistentObject.pm:272
 STACK Bio::DB::BioSQL::SeqAdaptor::store_children 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/SeqAdaptor.pm:226
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
 STACK Bio::DB::Persistent::PersistentObject::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/Persistent/PersistentObject.pm:272
 STACK (eval) load_seqdatabase.pl:620
 STACK toplevel load_seqdatabase.pl:602
 
 -------
 
 
 To create the biosql schema i used biosqldb-mysql.sql
 
 version info;
 -- $Id: biosqldb-mysql.sql,v 1.41 2005/04/18 05:21:38 lapp Exp $
 
 --------
 
 
 If i look at the data that has been created in the database, the first 
 1000 or so entries load successfully.  The offending record begins..
 
    ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
    AC   Q5RFJ2; Q5RDK2;
 
 The script doesn't like Accession Number "Q5RFJ2", but the only other place it 
 shows up in the file is further down the same record as a "DR" entry.
 
    DR   SMR; Q5RFJ2; 1-230.
 
 I'm new to this and still trying to figure out the structure of all the tables.  
 Does anyone have any idea what is happening? 
 
 thanks for the help!
 
 
From s.rayner at att.net  Mon May 15 08:34:15 2006
From: s.rayner at att.net (s.rayner at att.net)
Date: Mon, 15 May 2006 12:34:15 +0000
Subject: [BioSQL-l]  error loading uniprot release 49.6 into mysql
Message-ID: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>

I found where the script is hiccuping....

The Uniprot release contains lines with identical annotation for the RL keyword for two different sequences.

___________________

First occurence...  
___________________

ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
AC   Q5RFJ2; Q5RDK2;
DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2005, sequence version 2.
DT   18-APR-2006, entry version 13.
DE   14-3-3 protein theta.
GN   Name=YWHAQ;
OS   Pongo pygmaeus (Orangutan).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Pongo.
OX   NCBI_TaxID=9600;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Brain cortex, and Kidney;
RG   The German cDNA consortium;
RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.  <======  Not Unique


___________________

Second occurence...  
___________________


ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
AC   Q5RC20;
DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2005, sequence version 2.
DT   18-APR-2006, entry version 13.
DE   14-3-3 protein gamma.
GN   Name=YWHAG;
OS   Pongo pygmaeus (Orangutan).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Pongo.
OX   NCBI_TaxID=9600;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Heart;
RG   The German cDNA consortium;
RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   <======  Not Unique


in these two cases the generated CRC key is identical and so MySQL throws a wobbly.

if i look at the MySQL entry in the REFERENCE table for the first sequence
------+-------+---------+----------------------+
|          139 |      NULL | Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
+--------------+-----------+----------------------------------------------------

and the error when the script choked was 

 MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were 
 ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ 
 databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
 Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3

hence the problem.

I'm guessing i'm not the first person to encounter this, but dont see any hints for an easy way around this.  

any suggestions....?

ta


From hlapp at gmx.net  Mon May 15 12:59:06 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 12:59:06 -0400
Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql
In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
Message-ID: <C78E4724-CC95-483E-876B-69AF7C1CC6AF@gmx.net>

You found the right instance. Unfortunately with the way the bioperl  
swissprot parser works the group (RG) isn't promoted to author if  
there is no author in addition (in fact you may debate whether that  
would even be the best way of doing things), so it doesn't find it on  
second occurrence by unique key.

If you can live without this entry, or any other entry that causes a  
hiccup, just supply the flag --safe and it will gracefully move on to  
the next entry.

Fixing the issue would require either to fix the bioperl swissprot  
parser (or Bio::Annotation::Reference) to stick the RG group into the  
author slot if there is no author, or to fix Bioperl  
Bio::Annotation::Reference to also feature a group and biosql to use  
it in place of a missing author.

Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql)  
should just use that in place of a missing author?

The downside is that upon round-tripping an entry, the RG annotation  
line will become an RA annotation line. How bad would that be?

Any thoughts from anyone?

	-hilmar

On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote:

> I found where the script is hiccuping....
>
> The Uniprot release contains lines with identical annotation for  
> the RL keyword for two different sequences.
>
> ___________________
>
> First occurence...
> ___________________
>
> ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
> AC   Q5RFJ2; Q5RDK2;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein theta.
> GN   Name=YWHAQ;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Brain cortex, and Kidney;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   
> <======  Not Unique
>
>
> ___________________
>
> Second occurence...
> ___________________
>
>
> ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
> AC   Q5RC20;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein gamma.
> GN   Name=YWHAG;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Heart;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.    
> <======  Not Unique
>
>
>
> in these two cases the generated CRC key is identical and so MySQL  
> throws a wobbly.
>
> if i look at the MySQL entry in the REFERENCE table for the first  
> sequence
> ------+-------+---------+----------------------+
> |          139 |      NULL | Submitted (NOV-2004) to the EMBL/ 
> GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
> +--------------+----------- 
> +----------------------------------------------------
>
> and the error when the script choked was
>
>  MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
> values were
>  ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ
>  databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
>  Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
>
> hence the problem.
>
> I'm guessing i'm not the first person to encounter this, but dont  
> see any hints for an easy way around this.
>
> any suggestions....?
>
> ta
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From simon.rayner.cn at gmail.com  Mon May 15 20:46:00 2006
From: simon.rayner.cn at gmail.com (simon rayner)
Date: Tue, 16 May 2006 00:46:00 +0000
Subject: [BioSQL-l]  error loading uniprot release 49.6 into mysql
In-Reply-To: 051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net
Message-ID: <1147740360.3338.1.camel@biowiv.wivbio>

thanks for the help.

one way i was thinking of getting around it before i got your email
about the --safe flag was to append an extra character to the offending
string. So, in my case i would have

"Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases."

and

"Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.2"

which would presumably create different keys.

However, i realise the downside of this is that i have now 
modified the data source...


From hlapp at gmx.net  Thu May 18 10:38:33 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 18 May 2006 10:38:33 -0400
Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql
In-Reply-To: <1147740360.3338.1.camel@biowiv.wivbio>
References: <1147740360.3338.1.camel@biowiv.wivbio>
Message-ID: <E29201C1-09F7-48D1-83FA-6EA3E304ACDC@gmx.net>

Yes, you have.

Eventually I think this needs to be fixed by how the RG field is  
dealt with in either bioperl or bioperl-db.

	-hilmar

On May 15, 2006, at 8:46 PM, simon rayner wrote:

> thanks for the help.
>
> one way i was thinking of getting around it before i got your email
> about the --safe flag was to append an extra character to the  
> offending
> string. So, in my case i would have
>
> "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases."
>
> and
>
> "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.2"
>
> which would presumably create different keys.
>
> However, i realise the downside of this is that i have now
> modified the data source...
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From darin.london at duke.edu  Mon May 22 11:29:45 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 11:29:45 -0400
Subject: [BioSQL-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <4471D8E9.8090109@duke.edu>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.


From Gerben.Menschaert at UGent.be  Tue May 23 12:02:12 2006
From: Gerben.Menschaert at UGent.be (Gerben Menschaert)
Date: Tue, 23 May 2006 18:02:12 +0200
Subject: [BioSQL-l] load_seqdatabase.pl error due to bad DBSOURCE parsing
Message-ID: <20060523160212.1BF9814A0D6F@tarzan.ugent.be>

Hello all,

I'm trying to load genbank accession number Q99ML8 as a genpept file with
the load_seqdatabase.pl script. It fails on the DB_SOURCE part:

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were
("","UniGene:Mm.388865","0","") FKs ()
ORA-01400: cannot insert NULL into ("TEST_BIOSQL"."SG_DBXREF"."DBNAME") (DBD
ERROR: error possibly near <*> indicator at char 57 in 'INSERT INTO dbxref
(dbname, accession, version) VALUES (:<*>p1, :p2, :p3)')
---------------------------------------------------

The DBSOURCE block from the genpept file looks like this:

DBSOURCE    swissprot: locus UCN2_MOUSE, accession Q99ML8;
            class: standard.
            created: May 10, 2002.
            sequence updated: Jun 1, 2001.
            annotation updated: May 16, 2006.
            xrefs: AF331517.1, AAK16157.1
            xrefs (non-sequence databases): UniGene:Mm.388865,
            Ensembl:ENSMUSG00000049699, MGI:2176375, GO:0005576, GO:0001664,
            GO:0006171, GO:0007586, GO:0006950

How is this parsed? Could anybody point me into the good direction?

Regards,
Gerben


From hlapp at gmx.net  Tue May 23 13:03:30 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 23 May 2006 13:03:30 -0400
Subject: [BioSQL-l] load_seqdatabase.pl error due to bad DBSOURCE parsing
In-Reply-To: <20060523160212.1BF9814A0D6F@tarzan.ugent.be>
References: <20060523160212.1BF9814A0D6F@tarzan.ugent.be>
Message-ID: <A87D0578-B589-4114-A28D-7A8CD1C47009@gmx.net>

This is in reality a Uniprot entry. The Genbank parser apparently  
doesn't succeed in picking apart accession and namespace prefix  
(dbname).

If at all possible I'd recommend loading Uniprot in Uniprot  
(swissprot) format. Would that work for you?

	-hilmar

On May 23, 2006, at 12:02 PM, Gerben Menschaert wrote:

> Hello all,
>
> I'm trying to load genbank accession number Q99ML8 as a genpept  
> file with
> the load_seqdatabase.pl script. It fails on the DB_SOURCE part:
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed,  
> values were
> ("","UniGene:Mm.388865","0","") FKs ()
> ORA-01400: cannot insert NULL into  
> ("TEST_BIOSQL"."SG_DBXREF"."DBNAME") (DBD
> ERROR: error possibly near <*> indicator at char 57 in 'INSERT INTO  
> dbxref
> (dbname, accession, version) VALUES (:<*>p1, :p2, :p3)')
> ---------------------------------------------------
>
> The DBSOURCE block from the genpept file looks like this:
>
> DBSOURCE    swissprot: locus UCN2_MOUSE, accession Q99ML8;
>             class: standard.
>             created: May 10, 2002.
>             sequence updated: Jun 1, 2001.
>             annotation updated: May 16, 2006.
>             xrefs: AF331517.1, AAK16157.1
>             xrefs (non-sequence databases): UniGene:Mm.388865,
>             Ensembl:ENSMUSG00000049699, MGI:2176375, GO:0005576, GO: 
> 0001664,
>             GO:0006171, GO:0007586, GO:0006950
>
> How is this parsed? Could anybody point me into the good direction?
>
> Regards,
> Gerben
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From gad14 at cornell.edu  Tue May 23 15:38:31 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Tue, 23 May 2006 15:38:31 -0400
Subject: [BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with
	biosql
In-Reply-To: <D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
Message-ID: <447364B7.4030203@cornell.edu>

Hi Hilmar,

I apologize in advance if I'm talking about something that is well 
documented somewhere but I'm still having trouble understanding exactly 
what I need to do to get a biosql database loaded in such a way so that 
it can interact fully with gbrowse - it seems to be half way there.

I use load_seqdatabase.pl to load the genome sequence (single sequence 
in fasta format) into a biosql database but I populate the db with 
features using what I thought was a GFF-centric approach, not with 
load_seqdatabase.pl -- see my code in #4 below.

Here is exactly what I do:

1) Create a mysql database called 'test_biosql' with correct permissions
2) Load the biosql schema:

	mysql --user=xx --password=xx test_biosql < 
/usr/local/biosql-schema/sql/biosqldb-mysql.sql

3) Use load_seqdatabase.pl to load the single genomic dna sequence:

	load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 
-namespace NC_004578 -format fasta 6853.fasta

4) I then use a script I wrote to load the SeqFeatures which are in gff 
format in a file i pass in as as arg ($in). Here is the code:


# read gff file into gff io object
my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3);

# create a Bio::DB::DBAdaptorI implementing object
my $db = Bio::DB::BioDB->new(-database   => $dbname,
                              -port       => $port,
                              -dbname     => $database,
                              -driver     => $driver,
                              -user       => $user,
                              -pass       => $pass,
                              );

# get appropriate object adaptor
my $adp = $db->get_object_adaptor("Bio::SeqI");

my $acc = "NC_004578"; # the genome seq id already in the db
my $seq = Bio::Seq->new(-accession_number => $acc,
			-display_id => $acc,
			-primary_id => $acc,
                         -namespace => $acc);

# Locate entry matching the unique key attributes and populate a
# persistent object with this entry.
my $dbseq = $adp->find_by_unique_key($seq);

# insert features from gff file into database.
while (my $feat = $gffio->next_feature()) {
   $dbseq->add_SeqFeature($feat);
   $dbseq->store;
   $dbseq->commit();
}


Is there additional code I should have here? I realize you're not a 
expert/user of gbrowse.. and this problem seems to be related to the 
gbrowse_details cgi script, which you probably are not familiar with. 
But I'm CC'ing the lists in case anyone else has some clues. I do 
appreciate any insight you might have though. It would be good to know 
if I'm doing all that I need to do to fully and correctly populate a 
biosql db with GFF/SeqFeature.

Thanks,
Genevieve


Hilmar Lapp wrote:
> Hi Genevieve,
> 
> there's a couple more regular users of BioSQL than one (about 25-30  
> groups), but not many who run GBrowse off of BioSQL (and I don't  count 
> among those - yet).
> 
> Of those who have posted before that they accomplished this, I  believe 
> none were using load_seqdatabase.pl to load the data.  Instead, they 
> loaded data through the DBGFF adaptor for BioSQL, i.e.,  like you would 
> load data into a GFF database, just using a different  adaptor.
> 
> load_seqdatabase.pl will load through the sequence-centric Bioperl  
> object model, and has no notion of GFF or GFF3 and associated  
> constraints (controlled vocabulary for feature type and source terms,  
> location types etc).
> 
> It is probably possible to load data through load_seqdatabase.pl and  
> then render it through GBrowse but doing so will almost certainly  
> require a SeqProcessor (see --pipeline argument to  load_seqdatabase.pl) 
> to be written that will appropriately unflatten  the feature array and 
> in fact probably have to use  SeqFeature::Annotated (where actually did 
> SeqFeature::TypedSeqFeature  go?). In parallel, bioperl-db will need to 
> be fixed to be prepared  for SeqFeatureI implementations that use 
> ontology terms for  primary_tag and source_tag instead of strings.
> 
> Is it possible for you to load your data through a GFF3 intermediary?  
> Bioperl has modules and in fact scripts that will write GFF3 (if I'm  
> not mistaken ...).
> 
>     -hilmar
> 
> On May 18, 2006, at 5:58 PM, Lincoln Stein wrote:
> 
>> Hi Genevieve,
>>
>> The problem is that none of us really knows anything about BioSQL.  
>> Hilmar is
>> the only regular user of this database. He's now gone to NESCent (duke
>> university) and may not be receiving mail sent to GNF.
>>
>> Lincoln
>>
>> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote:
>>
>>> Hi Scott,
>>>
>>> I'm still having the same problem. It might have to do with how the
>>> BioSQL database is populated. I use the load_seqdatabase.pl script to
>>> load the database along with bioperl-db functions for loading
>>> SeqFeatures directly. I took a closer look at how the tables are
>>> populated in the biosql tables. (If you're not familiar with  BioSQL the
>>> following may not be familiar to you -- i just want to put this
>>> observation out there...). I noticed that the 'term_id' field in the
>>> Location table was empty for the first gene record i had loaded.  When I
>>> set term_id to be '11', the id that corresponds with the 'gene'  
>>> ontology
>>> term, i notice a positive change in what's displayed on the
>>> gbrowse_details page for this record... the name of the gene  'dnaA' now
>>> appears in the title line in large blue font, as it should. The class
>>> name is still missing, as does all the detail about this gene -
>>> coordinates, etc.
>>>
>>> Lincoln suggests that I talk directly Hilmar Lapp who is the main  
>>> BioSQL
>>> developer. It could be that I am bumping up against things that  haven't
>>> been developed yet as far as the GBrowse<->BioSQL db connectivity  goes.
>>> I've been taking a closer look at gbrowse_details.pl, Browser.pm and
>>> Util.pm in order to try to understand where the disconnect might  be...
>>>
>>> To answer your question below.. yes GBrowse works fine for the
>>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table  database. 
>>> I'm
>>> using this installation of GBrowse 1.64 for several MySQL  databases 
>>> with
>>> the default gbrowse tables... everything is working fine. My only
>>> trouble with gbrowse crops up when interfacing with the biosql  mysql 
>>> db.
>>>
>>> Thanks for all your help,
>>> Genevieve
>>>
>>> Scott Cain wrote:
>>>
>>>> Hi Genevieve,
>>>>
>>>> I'm sorry this has hung out there unanswered for so long.  I  
>>>> suppose it
>>>> was because I chose not to answer it because it involved BioSQL  
>>>> (which I
>>>> know just about nothing about) and Simon seemed to think that the  
>>>> MySQL
>>>> adaptor was involved somehow (though it doesn't look to me like  it 
>>>> is).
>>>>
>>>> Anyway, I'll try to get started answering your questions  (assuming you
>>>> haven't already puzzled you way to one already).  See my comments  
>>>> below.
>>>>
>>>> Scott
>>>>
>>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote:
>>>>
>>>>> Hi,
>>>>> I'm running gbrowse 1.64 with a biosql database on a mac with  bioperl
>>>>> 1.5.1. I successfully loaded the database with  load_seqdatabase.pl 
>>>>> with
>>>>> NC_004578.gbk from NCBI.
>>>>>
>>>>> The features display as they should on the main gbrowse details  pane.
>>>>> However, when I click on one of the features I get GBrowse  Details 
>>>>> data
>>>>> record page with ":Details" at the top in large blue font but no  data
>>>>> for that gene display. In smaller red font, "Requested feature  not 
>>>>> found
>>>>> in database" which is followed by the normal details page footer  info
>>>>> ("For the source code for this browser, see...", etc).
>>>>>
>>>>> I'm using the 06.biosql.conf file - with appropriate additions in
>>>>> db_args for my database. I changed 'link' to
>>>>>
>>>>>     link = AUTO
>>>>>
>>>>> from what was there
>>>>>
>>>>>     link = 
>>>>> http://localhost/perl/gbrowse?ref=$ref;start=$start;stop= $end
>>>>>
>>>>> The default suggestion for 'link' is a little confusing.. why  does it
>>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why isn't
>>>>> 'cgi-bin' in the path?
>>>>
>>>>
>>>> I'm not sure why the suggested link is the way it is; perhaps that
>>>> config file predates the gbrowse_details script and no one  changed 
>>>> this
>>>> sample config file.  I changed it and it will be changed in the next
>>>> release.
>>>>
>>>> As for the path, 'perl' is a common url convention for scripts  that 
>>>> are
>>>> running under mod_perl, so I suspect the person who wrote this  sample
>>>> config file was running mod_perl.
>>>>
>>>>> When i set link to 'AUTO' I at least get the details page.
>>>>> gbrowse_details is not getting what it needs to disaply the  record 
>>>>> info
>>>>> though. The webserver error I get is:
>>>>>
>>>>> Subroutine Bio::SeqFeature::Generic::type redefined at
>>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/ BioSQL.pm
>>>>> line 126.
>>>>
>>>>
>>>> I'm not sure this is really the problem.  Let me make sure: it is  
>>>> after
>>>> you changed link to AUTO that you see this?  That is, the page  you see
>>>> now is as you described in your second paragraph, right?   
>>>> Unfortunately,
>>>> this is the part where I become particularly useless, since I  don't 
>>>> know
>>>> anything about BioSQL.  Is the details page working OK for the
>>>> yeast_chr1 dataset?
>>>>
>>>> Scott
>>>>
>>>>> I took a look at line 126 in BioSQL.pm - not sure what to make  of it.
>>>>>
>>>>> Any ideas? Am I overlooking anything?
>>>>>
>>>>> Thanks,
>>>>> Genevieve
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------
>>>>> Using Tomcat but need to do more? Need to support web services,  
>>>>> security?
>>>>> Get stuff done quickly with pre-integrated technology to make  your 
>>>>> job
>>>>> easier Download IBM WebSphere Application Server v.1.0.1 based  on 
>>>>> Apache
>>>>> Geronimo
>>>>> http://sel.as-us.falkag.net/sel? 
>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>> _______________________________________________
>>>>> Gmod-gbrowse mailing list
>>>>> Gmod-gbrowse at lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>>
>> -- 
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>>
>> -------------------------------------------------------
>> Using Tomcat but need to do more? Need to support web services,  
>> security?
>> Get stuff done quickly with pre-integrated technology to make your  
>> job easier
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache  
>> Geronimo
>> http://sel.as-us.falkag.net/sel? cmd=lnk&kid=120709&bid=263057&dat=121642
>> _______________________________________________
>> Gmod-gbrowse mailing list
>> Gmod-gbrowse at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
> 


From hlapp at gmx.net  Tue May 23 23:50:21 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 24 May 2006 04:50:21 +0100
Subject: [BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with
	biosql
In-Reply-To: <447364B7.4030203@cornell.edu>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
	<447364B7.4030203@cornell.edu>
Message-ID: <CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>

Hi Genevieve & Scott, see below for interspersed comments.

On May 23, 2006, at 3:38 PM, Genevieve DeClerck wrote:

> Hi Hilmar,
>
> I apologize in advance if I'm talking about something that is well  
> documented somewhere
> but I'm still having trouble understanding exactly what I need to  
> do to get a biosql database loaded in such a way so that it can  
> interact fully with gbrowse - it seems to be half way there.
>
> I use load_seqdatabase.pl to load the genome sequence (single  
> sequence in fasta format) into a biosql database but I populate the  
> db with features using what I thought was a GFF-centric approach,  
> not with load_seqdatabase.pl -- see my code in #4 below.
>
> Here is exactly what I do:
>
> [...]
> 3) Use load_seqdatabase.pl to load the single genomic dna sequence:
>
> 	load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 - 
> namespace NC_004578 -format fasta 6853.fasta
>
> 4) I then use a script I wrote to load the SeqFeatures which are in  
> gff format in a file i pass in as as arg ($in). Here is the code:
>
>
> # read gff file into gff io object
> my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3);
>
> # create a Bio::DB::DBAdaptorI implementing object
> my $db = Bio::DB::BioDB->new(-database   => $dbname,
>                              -port       => $port,
>                              -dbname     => $database,
>                              -driver     => $driver,
>                              -user       => $user,
>                              -pass       => $pass,
>                              );
>
> # get appropriate object adaptor
> my $adp = $db->get_object_adaptor("Bio::SeqI");
>
> my $acc = "NC_004578"; # the genome seq id already in the db
> my $seq = Bio::Seq->new(-accession_number => $acc,
> 			-display_id => $acc,
> 			-primary_id => $acc,
>                         -namespace => $acc);
>
> # Locate entry matching the unique key attributes and populate a
> # persistent object with this entry.
> my $dbseq = $adp->find_by_unique_key($seq);
>
> # insert features from gff file into database.
> while (my $feat = $gffio->next_feature()) {
>   $dbseq->add_SeqFeature($feat);
>   $dbseq->store;
>   $dbseq->commit();
> }
>
>
> Is there additional code I should have here? I realize you're not a  
> expert/user of gbrowse.. and this problem seems to be related to  
> the gbrowse_details cgi script, which you probably are not familiar  
> with.

So your use case is that you have a sequence in simple fasta format  
with its annotation in another file in GFF3 format, and you want to  
load both into a Biosql database and visualize in GBrowse.

It looks like I was in fact on the wrong path the whole time. The  
Gbrowse Biosql adaptor that I can find is a Bio::DasI adaptor through  
which you cannot load but only retrieve, so I have to assume that you  
were right in using load_seqdatabase.pl. Can somebody help out here  
who has been using Biosql as the underlying database and confirm or  
set me straight?

If that is the procedure then your code looks alright. Also, it looks  
like Bio::Tools::GFF does not return hierarchical feature graphs for  
v3 input (which bioperl-db wouldn't handle properly because it  
doesn't support the feature_relationship table yet).

So, I'm in fact at a loss explaining why the details page doesn't  
work for you, given that people reported it to work before. I'm  
inclined to claim that the respective Gbrowse code has changed,  
either in the way it expects the feature to be set up, or in the way  
it uses the DasI interface, and broke the Biosql adaptor. Can  
somebody (Scott? Lincoln?) comment on whether there were any changes  
in this regard?

The lines in gbrowse_detail that look like lead to the problem is

my @features = sort {$b->length<=>$a->length} $CONFIG->_feature_get 
($db,$name,$class);
@features    = sort {$b->length<=>$a->length} $CONFIG->_feature_get 
($db,$ref,$class,$start,$end,1)
   unless @features;

neither of which returns any matches.

I have no clue yet how those two calls get translated into DasI to  
bioperl-db queries.

	-hilmar

> But I'm CC'ing the lists in case anyone else has some clues. I do  
> appreciate any insight you might have though. It would be good to  
> know if I'm doing all that I need to do to fully and correctly  
> populate a biosql db with GFF/SeqFeature.
>
> Thanks,
> Genevieve
>
>
>
> Hilmar Lapp wrote:
>> Hi Genevieve,
>> there's a couple more regular users of BioSQL than one (about  
>> 25-30  groups), but not many who run GBrowse off of BioSQL (and I  
>> don't  count among those - yet).
>> Of those who have posted before that they accomplished this, I   
>> believe none were using load_seqdatabase.pl to load the data.   
>> Instead, they loaded data through the DBGFF adaptor for BioSQL,  
>> i.e.,  like you would load data into a GFF database, just using a  
>> different  adaptor.
>> load_seqdatabase.pl will load through the sequence-centric  
>> Bioperl  object model, and has no notion of GFF or GFF3 and  
>> associated  constraints (controlled vocabulary for feature type  
>> and source terms,  location types etc).
>> It is probably possible to load data through load_seqdatabase.pl  
>> and  then render it through GBrowse but doing so will almost  
>> certainly  require a SeqProcessor (see --pipeline argument to   
>> load_seqdatabase.pl) to be written that will appropriately  
>> unflatten  the feature array and in fact probably have to use   
>> SeqFeature::Annotated (where actually did  
>> SeqFeature::TypedSeqFeature  go?). In parallel, bioperl-db will  
>> need to be fixed to be prepared  for SeqFeatureI implementations  
>> that use ontology terms for  primary_tag and source_tag instead of  
>> strings.
>> Is it possible for you to load your data through a GFF3  
>> intermediary?  Bioperl has modules and in fact scripts that will  
>> write GFF3 (if I'm  not mistaken ...).
>>     -hilmar
>> On May 18, 2006, at 5:58 PM, Lincoln Stein wrote:
>>> Hi Genevieve,
>>>
>>> The problem is that none of us really knows anything about  
>>> BioSQL.  Hilmar is
>>> the only regular user of this database. He's now gone to NESCent  
>>> (duke
>>> university) and may not be receiving mail sent to GNF.
>>>
>>> Lincoln
>>>
>>> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote:
>>>
>>>> Hi Scott,
>>>>
>>>> I'm still having the same problem. It might have to do with how the
>>>> BioSQL database is populated. I use the load_seqdatabase.pl  
>>>> script to
>>>> load the database along with bioperl-db functions for loading
>>>> SeqFeatures directly. I took a closer look at how the tables are
>>>> populated in the biosql tables. (If you're not familiar with   
>>>> BioSQL the
>>>> following may not be familiar to you -- i just want to put this
>>>> observation out there...). I noticed that the 'term_id' field in  
>>>> the
>>>> Location table was empty for the first gene record i had  
>>>> loaded.  When I
>>>> set term_id to be '11', the id that corresponds with the 'gene'   
>>>> ontology
>>>> term, i notice a positive change in what's displayed on the
>>>> gbrowse_details page for this record... the name of the gene   
>>>> 'dnaA' now
>>>> appears in the title line in large blue font, as it should. The  
>>>> class
>>>> name is still missing, as does all the detail about this gene -
>>>> coordinates, etc.
>>>>
>>>> Lincoln suggests that I talk directly Hilmar Lapp who is the  
>>>> main  BioSQL
>>>> developer. It could be that I am bumping up against things that   
>>>> haven't
>>>> been developed yet as far as the GBrowse<->BioSQL db  
>>>> connectivity  goes.
>>>> I've been taking a closer look at gbrowse_details.pl, Browser.pm  
>>>> and
>>>> Util.pm in order to try to understand where the disconnect  
>>>> might  be...
>>>>
>>>> To answer your question below.. yes GBrowse works fine for the
>>>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table   
>>>> database. I'm
>>>> using this installation of GBrowse 1.64 for several MySQL   
>>>> databases with
>>>> the default gbrowse tables... everything is working fine. My only
>>>> trouble with gbrowse crops up when interfacing with the biosql   
>>>> mysql db.
>>>>
>>>> Thanks for all your help,
>>>> Genevieve
>>>>
>>>> Scott Cain wrote:
>>>>
>>>>> Hi Genevieve,
>>>>>
>>>>> I'm sorry this has hung out there unanswered for so long.  I   
>>>>> suppose it
>>>>> was because I chose not to answer it because it involved  
>>>>> BioSQL  (which I
>>>>> know just about nothing about) and Simon seemed to think that  
>>>>> the  MySQL
>>>>> adaptor was involved somehow (though it doesn't look to me  
>>>>> like  it is).
>>>>>
>>>>> Anyway, I'll try to get started answering your questions   
>>>>> (assuming you
>>>>> haven't already puzzled you way to one already).  See my  
>>>>> comments  below.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote:
>>>>>
>>>>>> Hi,
>>>>>> I'm running gbrowse 1.64 with a biosql database on a mac with   
>>>>>> bioperl
>>>>>> 1.5.1. I successfully loaded the database with   
>>>>>> load_seqdatabase.pl with
>>>>>> NC_004578.gbk from NCBI.
>>>>>>
>>>>>> The features display as they should on the main gbrowse  
>>>>>> details  pane.
>>>>>> However, when I click on one of the features I get GBrowse   
>>>>>> Details data
>>>>>> record page with ":Details" at the top in large blue font but  
>>>>>> no  data
>>>>>> for that gene display. In smaller red font, "Requested  
>>>>>> feature  not found
>>>>>> in database" which is followed by the normal details page  
>>>>>> footer  info
>>>>>> ("For the source code for this browser, see...", etc).
>>>>>>
>>>>>> I'm using the 06.biosql.conf file - with appropriate additions in
>>>>>> db_args for my database. I changed 'link' to
>>>>>>
>>>>>>     link = AUTO
>>>>>>
>>>>>> from what was there
>>>>>>
>>>>>>     link = http://localhost/perl/gbrowse?ref=$ref;start= 
>>>>>> $start;stop= $end
>>>>>>
>>>>>> The default suggestion for 'link' is a little confusing.. why   
>>>>>> does it
>>>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why  
>>>>>> isn't
>>>>>> 'cgi-bin' in the path?
>>>>>
>>>>>
>>>>> I'm not sure why the suggested link is the way it is; perhaps that
>>>>> config file predates the gbrowse_details script and no one   
>>>>> changed this
>>>>> sample config file.  I changed it and it will be changed in the  
>>>>> next
>>>>> release.
>>>>>
>>>>> As for the path, 'perl' is a common url convention for scripts   
>>>>> that are
>>>>> running under mod_perl, so I suspect the person who wrote this   
>>>>> sample
>>>>> config file was running mod_perl.
>>>>>
>>>>>> When i set link to 'AUTO' I at least get the details page.
>>>>>> gbrowse_details is not getting what it needs to disaply the   
>>>>>> record info
>>>>>> though. The webserver error I get is:
>>>>>>
>>>>>> Subroutine Bio::SeqFeature::Generic::type redefined at
>>>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/  
>>>>>> BioSQL.pm
>>>>>> line 126.
>>>>>
>>>>>
>>>>> I'm not sure this is really the problem.  Let me make sure: it  
>>>>> is  after
>>>>> you changed link to AUTO that you see this?  That is, the page   
>>>>> you see
>>>>> now is as you described in your second paragraph, right?    
>>>>> Unfortunately,
>>>>> this is the part where I become particularly useless, since I   
>>>>> don't know
>>>>> anything about BioSQL.  Is the details page working OK for the
>>>>> yeast_chr1 dataset?
>>>>>
>>>>> Scott
>>>>>
>>>>>> I took a look at line 126 in BioSQL.pm - not sure what to  
>>>>>> make  of it.
>>>>>>
>>>>>> Any ideas? Am I overlooking anything?
>>>>>>
>>>>>> Thanks,
>>>>>> Genevieve
>>>>>>
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------
>>>>>> Using Tomcat but need to do more? Need to support web  
>>>>>> services,  security?
>>>>>> Get stuff done quickly with pre-integrated technology to make   
>>>>>> your job
>>>>>> easier Download IBM WebSphere Application Server v.1.0.1  
>>>>>> based  on Apache
>>>>>> Geronimo
>>>>>> http://sel.as-us.falkag.net/sel?  
>>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>>> _______________________________________________
>>>>>> Gmod-gbrowse mailing list
>>>>>> Gmod-gbrowse at lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>> -- 
>>> Lincoln D. Stein
>>> Cold Spring Harbor Laboratory
>>> 1 Bungtown Road
>>> Cold Spring Harbor, NY 11724
>>> (516) 367-8380 (voice)
>>> (516) 367-8389 (fax)
>>> FOR URGENT MESSAGES & SCHEDULING,
>>> PLEASE CONTACT MY ASSISTANT,
>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>>
>>>
>>> -------------------------------------------------------
>>> Using Tomcat but need to do more? Need to support web services,   
>>> security?
>>> Get stuff done quickly with pre-integrated technology to make  
>>> your  job easier
>>> Download IBM WebSphere Application Server v.1.0.1 based on  
>>> Apache  Geronimo
>>> http://sel.as-us.falkag.net/sel?  
>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>
>
>
> -------------------------------------------------------
> All the advantages of Linux Managed Hosting--Without the Cost and  
> Risk!
> Fully trained technicians. The highest number of Red Hat  
> certifications in
> the hosting industry. Fanatical Support. Click to learn more
> http://sel.as-us.falkag.net/sel? 
> cmd=lnk&kid=107521&bid=248729&dat=121642
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mjcipriano at lbl.gov  Tue May 30 18:26:24 2006
From: mjcipriano at lbl.gov (Michael Cipriano)
Date: Tue, 30 May 2006 15:26:24 -0700
Subject: [BioSQL-l] Problem with add feature under BioSQL
In-Reply-To: <CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
	<447364B7.4030203@cornell.edu>
	<CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>
Message-ID: <1149027984.3139.105.camel@alien>

Hello,

I have found a problem with adding features via gbrowse_img with the
add=xxx tag. I am using the CVS version of GGB, bioperl-live and BioSQL
schema on mysql.

When using the add=xxx tag, it will produce a fatal error (with
BioSQL/Das interface). There error shown in the error log is:
link:
/cgi-bin/gbrowse_img/bacteria/?name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;

ERROR from apache error_log:
[Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't locate
object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
at /var/www/cgi-bin/gbrowse_img line 502.
[Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Premature end
of script headers: gbrowse_img

This is from this section of code at line ~506:

    unless ($segments{$refname}) {
      my @segments = map {
        eval{$_->absolute(0)}; $_  # so that rel2abs works properly
later
      }
        grep { $current_segment->overlaps($_) } get_segments($db,
$refname);
      return unless @segments;
      $segments{$refname} = $segments[0];
    }


The overlaps function is not defined in the
Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.

The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
below) in the file Bio/DB/Das/BioSQL/Segment.pm


#@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
@ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);


I am not sure if this will have any other consequences other then fixing
the bug I mentioned (and possibly fixing something else).

Can anyone tell me if this will introduce any new bugs, and if not, can
someone commit this change.

Thanks,
Michael Cipriano
Developer - LBNL


From hlapp at gmx.net  Wed May 31 14:33:40 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:33:40 -0400
Subject: [BioSQL-l] Problem with add feature under BioSQL
In-Reply-To: <1149027984.3139.105.camel@alien>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
	<447364B7.4030203@cornell.edu>
	<CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>
	<1149027984.3139.105.camel@alien>
Message-ID: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>

This should be a Gbrowse problem, not in Biosql or bioperl-db unless  
I'm missing something? Just trying to make sure ...

	-hilmar

On May 30, 2006, at 6:26 PM, Michael Cipriano wrote:

> Hello,
>
> I have found a problem with adding features via gbrowse_img with the
> add=xxx tag. I am using the CVS version of GGB, bioperl-live and  
> BioSQL
> schema on mysql.
>
> When using the add=xxx tag, it will produce a fatal error (with
> BioSQL/Das interface). There error shown in the error log is:
> link:
> /cgi-bin/gbrowse_img/bacteria/? 
> name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;
>
> ERROR from apache error_log:
> [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't  
> locate
> object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
> at /var/www/cgi-bin/gbrowse_img line 502.
> [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104]  
> Premature end
> of script headers: gbrowse_img
>
> This is from this section of code at line ~506:
>
>     unless ($segments{$refname}) {
>       my @segments = map {
>         eval{$_->absolute(0)}; $_  # so that rel2abs works properly
> later
>       }
>         grep { $current_segment->overlaps($_) } get_segments($db,
> $refname);
>       return unless @segments;
>       $segments{$refname} = $segments[0];
>     }
>
>
> The overlaps function is not defined in the
> Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.
>
> The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
> below) in the file Bio/DB/Das/BioSQL/Segment.pm
>
>
> #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
> @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);
>
>
> I am not sure if this will have any other consequences other then  
> fixing
> the bug I mentioned (and possibly fixing something else).
>
> Can anyone tell me if this will introduce any new bugs, and if not,  
> can
> someone commit this change.
>
> Thanks,
> Michael Cipriano
> Developer - LBNL
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mjcipriano at lbl.gov  Wed May 31 14:50:53 2006
From: mjcipriano at lbl.gov (Michael Cipriano)
Date: Wed, 31 May 2006 11:50:53 -0700
Subject: [BioSQL-l] Problem with add feature under BioSQL
In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
	<447364B7.4030203@cornell.edu>
	<CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>
	<1149027984.3139.105.camel@alien>
	<717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
Message-ID: <1149101453.3139.112.camel@alien>

Hi,

Yes, I only see the problem with gbrowse, though it could come up
anytime someone wants connect with Das using Bio::DB::Das::BioSQL and
needs the overlap function (or other range functions) in the Segment
object.

-Michael

On Wed, 2006-05-31 at 14:33 -0400, Hilmar Lapp wrote:
> This should be a Gbrowse problem, not in Biosql or bioperl-db unless  
> I'm missing something? Just trying to make sure ...
> 
> 	-hilmar
> 
> On May 30, 2006, at 6:26 PM, Michael Cipriano wrote:
> 
> > Hello,
> >
> > I have found a problem with adding features via gbrowse_img with the
> > add=xxx tag. I am using the CVS version of GGB, bioperl-live and  
> > BioSQL
> > schema on mysql.
> >
> > When using the add=xxx tag, it will produce a fatal error (with
> > BioSQL/Das interface). There error shown in the error log is:
> > link:
> > /cgi-bin/gbrowse_img/bacteria/? 
> > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;
> >
> > ERROR from apache error_log:
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't  
> > locate
> > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
> > at /var/www/cgi-bin/gbrowse_img line 502.
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104]  
> > Premature end
> > of script headers: gbrowse_img
> >
> > This is from this section of code at line ~506:
> >
> >     unless ($segments{$refname}) {
> >       my @segments = map {
> >         eval{$_->absolute(0)}; $_  # so that rel2abs works properly
> > later
> >       }
> >         grep { $current_segment->overlaps($_) } get_segments($db,
> > $refname);
> >       return unless @segments;
> >       $segments{$refname} = $segments[0];
> >     }
> >
> >
> > The overlaps function is not defined in the
> > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.
> >
> > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
> > below) in the file Bio/DB/Das/BioSQL/Segment.pm
> >
> >
> > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
> > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);
> >
> >
> > I am not sure if this will have any other consequences other then  
> > fixing
> > the bug I mentioned (and possibly fixing something else).
> >
> > Can anyone tell me if this will introduce any new bugs, and if not,  
> > can
> > someone commit this change.
> >
> > Thanks,
> > Michael Cipriano
> > Developer - LBNL
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
> >
> 


From lstein at cshl.edu  Wed May 31 14:47:47 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 14:47:47 -0400
Subject: [BioSQL-l] [Gmod-gbrowse] Re: Problem with add feature under
	BioSQL
In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
References: <4447A8D5.4@cornell.edu> <1149027984.3139.105.camel@alien>
	<717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
Message-ID: <200605311447.49454.lstein@cshl.edu>

I think this is a problem in the Bio::DB::Das::BioSQL::Segment module. I will 
add an overlaps() method.

Lincoln

On Wednesday 31 May 2006 14:33, Hilmar Lapp wrote:
> This should be a Gbrowse problem, not in Biosql or bioperl-db unless
> I'm missing something? Just trying to make sure ...
>
> 	-hilmar
>
> On May 30, 2006, at 6:26 PM, Michael Cipriano wrote:
> > Hello,
> >
> > I have found a problem with adding features via gbrowse_img with the
> > add=xxx tag. I am using the CVS version of GGB, bioperl-live and
> > BioSQL
> > schema on mysql.
> >
> > When using the add=xxx tag, it will produce a fatal error (with
> > BioSQL/Das interface). There error shown in the error log is:
> > link:
> > /cgi-bin/gbrowse_img/bacteria/?
> > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;
> >
> > ERROR from apache error_log:
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't
> > locate
> > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
> > at /var/www/cgi-bin/gbrowse_img line 502.
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104]
> > Premature end
> > of script headers: gbrowse_img
> >
> > This is from this section of code at line ~506:
> >
> >     unless ($segments{$refname}) {
> >       my @segments = map {
> >         eval{$_->absolute(0)}; $_  # so that rel2abs works properly
> > later
> >       }
> >         grep { $current_segment->overlaps($_) } get_segments($db,
> > $refname);
> >       return unless @segments;
> >       $segments{$refname} = $segments[0];
> >     }
> >
> >
> > The overlaps function is not defined in the
> > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.
> >
> > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
> > below) in the file Bio/DB/Das/BioSQL/Segment.pm
> >
> >
> > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
> > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);
> >
> >
> > I am not sure if this will have any other consequences other then
> > fixing
> > the bug I mentioned (and possibly fixing something else).
> >
> > Can anyone tell me if this will introduce any new bugs, and if not,
> > can
> > someone commit this change.
> >
> > Thanks,
> > Michael Cipriano
> > Developer - LBNL
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From lstein at cshl.edu  Wed May 31 14:56:30 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 14:56:30 -0400
Subject: [BioSQL-l] [Gmod-gbrowse] Re: Problem with add feature under
	BioSQL
In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
References: <4447A8D5.4@cornell.edu> <1149027984.3139.105.camel@alien>
	<717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
Message-ID: <200605311456.32088.lstein@cshl.edu>

I've just committed the fix into CVS. This will be available in the upcoming 
gbrowse release as well.

Lincoln

On Wednesday 31 May 2006 14:33, Hilmar Lapp wrote:
> This should be a Gbrowse problem, not in Biosql or bioperl-db unless
> I'm missing something? Just trying to make sure ...
>
> 	-hilmar
>
> On May 30, 2006, at 6:26 PM, Michael Cipriano wrote:
> > Hello,
> >
> > I have found a problem with adding features via gbrowse_img with the
> > add=xxx tag. I am using the CVS version of GGB, bioperl-live and
> > BioSQL
> > schema on mysql.
> >
> > When using the add=xxx tag, it will produce a fatal error (with
> > BioSQL/Das interface). There error shown in the error log is:
> > link:
> > /cgi-bin/gbrowse_img/bacteria/?
> > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;
> >
> > ERROR from apache error_log:
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't
> > locate
> > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
> > at /var/www/cgi-bin/gbrowse_img line 502.
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104]
> > Premature end
> > of script headers: gbrowse_img
> >
> > This is from this section of code at line ~506:
> >
> >     unless ($segments{$refname}) {
> >       my @segments = map {
> >         eval{$_->absolute(0)}; $_  # so that rel2abs works properly
> > later
> >       }
> >         grep { $current_segment->overlaps($_) } get_segments($db,
> > $refname);
> >       return unless @segments;
> >       $segments{$refname} = $segments[0];
> >     }
> >
> >
> > The overlaps function is not defined in the
> > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.
> >
> > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
> > below) in the file Bio/DB/Das/BioSQL/Segment.pm
> >
> >
> > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
> > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);
> >
> >
> > I am not sure if this will have any other consequences other then
> > fixing
> > the bug I mentioned (and possibly fixing something else).
> >
> > Can anyone tell me if this will introduce any new bugs, and if not,
> > can
> > someone commit this change.
> >
> > Thanks,
> > Michael Cipriano
> > Developer - LBNL
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From d49228002 at ym.edu.tw  Wed May 31 10:39:16 2006
From: d49228002 at ym.edu.tw (Yi-Feng Chang)
Date: Wed, 31 May 2006 22:39:16 +0800
Subject: [BioSQL-l] Error loading ontology terms
Message-ID: <000001c68c97$f461ac70$6801a8c0@iannb>

Dear All,
I've checked biosql archives, and found a similar thread (http://lists.open-bio.org/pipermail/biojava-l/2005-November/005151.html) however, it did not give specific solution. So I post here again, and hope there are someone could help me.
I'm using JDK1.5.0_05, Biojava 1.4, Biosql 1.41, and Mysql 5.0 with My_connectJ 3.1

I was following the demo source that provide by biojava-in-anger except for the database connection 
the exceptions were listed in following:

In first connection there would be a connection error

*** Importing a core ontology -- hope this is okay
*** Importing terms
Exception in thread "main" org.biojava.bio.BioException: Error connecting to BioSQL database: Connection is closed.
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:276)
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.<init>(BioSQLSequenceDB.java:194)
 at genevote.BioSQLTest.loadSeq(BioSQLTest.java:31)
 at genevote.BioSQLTest.main(BioSQLTest.java:70)
Caused by: java.sql.SQLException: Connection is closed.
 at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.checkOpen(PoolingDataSource.java:219)
 at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.createStatement(PoolingDataSource.java:248)
 at org.biojava.bio.seq.db.biosql.MySQLDBHelper.getInsertID(MySQLDBHelper.java:68)
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:268)
 ... 3 more

Then I tried again, it works, and I put all sequences in genbank format into biosql db without error.
But, while I tried to extract sequences, exception comes again.

org.biojava.bio.BioException: Error loading ontology terms
 at org.biojava.bio.seq.db.biosql.OntologySQL.loadOntology(OntologySQL.java:444)
 at org.biojava.bio.seq.db.biosql.OntologySQL.getOntology(OntologySQL.java:116)
 at org.biojava.bio.seq.db.biosql.OntologySQL.<init>(OntologySQL.java:413)
 at org.biojava.bio.seq.db.biosql.OntologySQL.getOntologySQL(OntologySQL.java:72)
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:240)
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.<init>(BioSQLSequenceDB.java:194)
 at genevote.test.loadSeq(test.java:25)
 at genevote.test.main(test.java:76)
Caused by: java.sql.SQLException: Unknown column 'name' in 'field list'
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2851)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1534)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1625)
 at com.mysql.jdbc.Connection.execSQL(Connection.java:2297)
 at com.mysql.jdbc.Connection.execSQL(Connection.java:2226)
 at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1812)
 at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1657)
 at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
 at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
 at org.biojava.bio.seq.db.biosql.OntologySQL.loadTerms(OntologySQL.java:339)
 at org.biojava.bio.seq.db.biosql.OntologySQL.loadOntology(OntologySQL.java:441)
 ... 7 more


From s.rayner at att.net  Mon May 15 06:13:06 2006
From: s.rayner at att.net (s.rayner at att.net)
Date: Mon, 15 May 2006 06:13:06 +0000
Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql
Message-ID: <051520060613.9311.44681BF0000A2A7A0000245F21604666489D0A02970E9DD29C@att.net>

 Hello,
 
 I have been trying to upload the current release of uniprot (version 49.6) into 
 MySQL using the most current version of load_seqdatabase.pl from CVS  
 
 (# $Id: load_seqdatabase.pl,v 1.24 2006/01/19 21:34:29 lapp Exp $)
 
 I have tested the script on subsets of uniprot and it loads without problem, but 
 when i attempt to load the full dataset, i end up with the follow error....
 
 
 biowiv:/usr/lib/perl5/bioperl-db/scripts/biosql # perl load_seqdatabase.pl 
 --dbname uniprot --dbuser XXXX --dbpass XXXX --format swiss 
 /var/downloads/sequence/uniprot_sprot.dat
 Loading /var/downloads/sequence/uniprot_sprot.dat ...
 
 -------------------- WARNING ---------------------
 MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were 
 ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ 
 databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
 Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
 ---------------------------------------------------
 Could not store Q5RFJ2:
 ------------- EXCEPTION  -------------
 MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found 
 by unique key
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:208
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
 STACK Bio::DB::Persistent::PersistentObject::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/Persistent/PersistentObject.pm:272
 STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:219
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
 STACK Bio::DB::Persistent::PersistentObject::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/Persistent/PersistentObject.pm:272
 STACK Bio::DB::BioSQL::SeqAdaptor::store_children 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/SeqAdaptor.pm:226
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216
 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
 STACK Bio::DB::Persistent::PersistentObject::store 
 /usr/lib/perl5/site_perl/5.8.7/Bio/DB/Persistent/PersistentObject.pm:272
 STACK (eval) load_seqdatabase.pl:620
 STACK toplevel load_seqdatabase.pl:602
 
 -------
 
 
 To create the biosql schema i used biosqldb-mysql.sql
 
 version info;
 -- $Id: biosqldb-mysql.sql,v 1.41 2005/04/18 05:21:38 lapp Exp $
 
 --------
 
 
 If i look at the data that has been created in the database, the first 
 1000 or so entries load successfully.  The offending record begins..
 
    ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
    AC   Q5RFJ2; Q5RDK2;
 
 The script doesn't like Accession Number "Q5RFJ2", but the only other place it 
 shows up in the file is further down the same record as a "DR" entry.
 
    DR   SMR; Q5RFJ2; 1-230.
 
 I'm new to this and still trying to figure out the structure of all the tables.  
 Does anyone have any idea what is happening? 
 
 thanks for the help!
 
 
From s.rayner at att.net  Mon May 15 12:34:15 2006
From: s.rayner at att.net (s.rayner at att.net)
Date: Mon, 15 May 2006 12:34:15 +0000
Subject: [BioSQL-l]  error loading uniprot release 49.6 into mysql
Message-ID: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>

I found where the script is hiccuping....

The Uniprot release contains lines with identical annotation for the RL keyword for two different sequences.

___________________

First occurence...  
___________________

ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
AC   Q5RFJ2; Q5RDK2;
DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2005, sequence version 2.
DT   18-APR-2006, entry version 13.
DE   14-3-3 protein theta.
GN   Name=YWHAQ;
OS   Pongo pygmaeus (Orangutan).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Pongo.
OX   NCBI_TaxID=9600;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Brain cortex, and Kidney;
RG   The German cDNA consortium;
RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.  <======  Not Unique


___________________

Second occurence...  
___________________


ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
AC   Q5RC20;
DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2005, sequence version 2.
DT   18-APR-2006, entry version 13.
DE   14-3-3 protein gamma.
GN   Name=YWHAG;
OS   Pongo pygmaeus (Orangutan).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Pongo.
OX   NCBI_TaxID=9600;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Heart;
RG   The German cDNA consortium;
RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   <======  Not Unique


in these two cases the generated CRC key is identical and so MySQL throws a wobbly.

if i look at the MySQL entry in the REFERENCE table for the first sequence
------+-------+---------+----------------------+
|          139 |      NULL | Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
+--------------+-----------+----------------------------------------------------

and the error when the script choked was 

 MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were 
 ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ 
 databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
 Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3

hence the problem.

I'm guessing i'm not the first person to encounter this, but dont see any hints for an easy way around this.  

any suggestions....?

ta


From hlapp at gmx.net  Mon May 15 16:59:06 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 12:59:06 -0400
Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql
In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
Message-ID: <C78E4724-CC95-483E-876B-69AF7C1CC6AF@gmx.net>

You found the right instance. Unfortunately with the way the bioperl  
swissprot parser works the group (RG) isn't promoted to author if  
there is no author in addition (in fact you may debate whether that  
would even be the best way of doing things), so it doesn't find it on  
second occurrence by unique key.

If you can live without this entry, or any other entry that causes a  
hiccup, just supply the flag --safe and it will gracefully move on to  
the next entry.

Fixing the issue would require either to fix the bioperl swissprot  
parser (or Bio::Annotation::Reference) to stick the RG group into the  
author slot if there is no author, or to fix Bioperl  
Bio::Annotation::Reference to also feature a group and biosql to use  
it in place of a missing author.

Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql)  
should just use that in place of a missing author?

The downside is that upon round-tripping an entry, the RG annotation  
line will become an RA annotation line. How bad would that be?

Any thoughts from anyone?

	-hilmar

On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote:

> I found where the script is hiccuping....
>
> The Uniprot release contains lines with identical annotation for  
> the RL keyword for two different sequences.
>
> ___________________
>
> First occurence...
> ___________________
>
> ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
> AC   Q5RFJ2; Q5RDK2;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein theta.
> GN   Name=YWHAQ;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Brain cortex, and Kidney;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   
> <======  Not Unique
>
>
> ___________________
>
> Second occurence...
> ___________________
>
>
> ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
> AC   Q5RC20;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein gamma.
> GN   Name=YWHAG;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Heart;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.    
> <======  Not Unique
>
>
>
> in these two cases the generated CRC key is identical and so MySQL  
> throws a wobbly.
>
> if i look at the MySQL entry in the REFERENCE table for the first  
> sequence
> ------+-------+---------+----------------------+
> |          139 |      NULL | Submitted (NOV-2004) to the EMBL/ 
> GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
> +--------------+----------- 
> +----------------------------------------------------
>
> and the error when the script choked was
>
>  MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
> values were
>  ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ
>  databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
>  Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
>
> hence the problem.
>
> I'm guessing i'm not the first person to encounter this, but dont  
> see any hints for an easy way around this.
>
> any suggestions....?
>
> ta
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From simon.rayner.cn at gmail.com  Tue May 16 00:46:00 2006
From: simon.rayner.cn at gmail.com (simon rayner)
Date: Tue, 16 May 2006 00:46:00 +0000
Subject: [BioSQL-l]  error loading uniprot release 49.6 into mysql
In-Reply-To: 051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net
Message-ID: <1147740360.3338.1.camel@biowiv.wivbio>

thanks for the help.

one way i was thinking of getting around it before i got your email
about the --safe flag was to append an extra character to the offending
string. So, in my case i would have

"Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases."

and

"Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.2"

which would presumably create different keys.

However, i realise the downside of this is that i have now 
modified the data source...


From hlapp at gmx.net  Thu May 18 14:38:33 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 18 May 2006 10:38:33 -0400
Subject: [BioSQL-l] error loading uniprot release 49.6 into mysql
In-Reply-To: <1147740360.3338.1.camel@biowiv.wivbio>
References: <1147740360.3338.1.camel@biowiv.wivbio>
Message-ID: <E29201C1-09F7-48D1-83FA-6EA3E304ACDC@gmx.net>

Yes, you have.

Eventually I think this needs to be fixed by how the RG field is  
dealt with in either bioperl or bioperl-db.

	-hilmar

On May 15, 2006, at 8:46 PM, simon rayner wrote:

> thanks for the help.
>
> one way i was thinking of getting around it before i got your email
> about the --safe flag was to append an extra character to the  
> offending
> string. So, in my case i would have
>
> "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases."
>
> and
>
> "Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.2"
>
> which would presumably create different keys.
>
> However, i realise the downside of this is that i have now
> modified the data source...
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From darin.london at duke.edu  Mon May 22 15:29:45 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 11:29:45 -0400
Subject: [BioSQL-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <4471D8E9.8090109@duke.edu>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.


From Gerben.Menschaert at UGent.be  Tue May 23 16:02:12 2006
From: Gerben.Menschaert at UGent.be (Gerben Menschaert)
Date: Tue, 23 May 2006 18:02:12 +0200
Subject: [BioSQL-l] load_seqdatabase.pl error due to bad DBSOURCE parsing
Message-ID: <20060523160212.1BF9814A0D6F@tarzan.ugent.be>

Hello all,

I'm trying to load genbank accession number Q99ML8 as a genpept file with
the load_seqdatabase.pl script. It fails on the DB_SOURCE part:

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were
("","UniGene:Mm.388865","0","") FKs ()
ORA-01400: cannot insert NULL into ("TEST_BIOSQL"."SG_DBXREF"."DBNAME") (DBD
ERROR: error possibly near <*> indicator at char 57 in 'INSERT INTO dbxref
(dbname, accession, version) VALUES (:<*>p1, :p2, :p3)')
---------------------------------------------------

The DBSOURCE block from the genpept file looks like this:

DBSOURCE    swissprot: locus UCN2_MOUSE, accession Q99ML8;
            class: standard.
            created: May 10, 2002.
            sequence updated: Jun 1, 2001.
            annotation updated: May 16, 2006.
            xrefs: AF331517.1, AAK16157.1
            xrefs (non-sequence databases): UniGene:Mm.388865,
            Ensembl:ENSMUSG00000049699, MGI:2176375, GO:0005576, GO:0001664,
            GO:0006171, GO:0007586, GO:0006950

How is this parsed? Could anybody point me into the good direction?

Regards,
Gerben


From hlapp at gmx.net  Tue May 23 17:03:30 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 23 May 2006 13:03:30 -0400
Subject: [BioSQL-l] load_seqdatabase.pl error due to bad DBSOURCE parsing
In-Reply-To: <20060523160212.1BF9814A0D6F@tarzan.ugent.be>
References: <20060523160212.1BF9814A0D6F@tarzan.ugent.be>
Message-ID: <A87D0578-B589-4114-A28D-7A8CD1C47009@gmx.net>

This is in reality a Uniprot entry. The Genbank parser apparently  
doesn't succeed in picking apart accession and namespace prefix  
(dbname).

If at all possible I'd recommend loading Uniprot in Uniprot  
(swissprot) format. Would that work for you?

	-hilmar

On May 23, 2006, at 12:02 PM, Gerben Menschaert wrote:

> Hello all,
>
> I'm trying to load genbank accession number Q99ML8 as a genpept  
> file with
> the load_seqdatabase.pl script. It fails on the DB_SOURCE part:
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed,  
> values were
> ("","UniGene:Mm.388865","0","") FKs ()
> ORA-01400: cannot insert NULL into  
> ("TEST_BIOSQL"."SG_DBXREF"."DBNAME") (DBD
> ERROR: error possibly near <*> indicator at char 57 in 'INSERT INTO  
> dbxref
> (dbname, accession, version) VALUES (:<*>p1, :p2, :p3)')
> ---------------------------------------------------
>
> The DBSOURCE block from the genpept file looks like this:
>
> DBSOURCE    swissprot: locus UCN2_MOUSE, accession Q99ML8;
>             class: standard.
>             created: May 10, 2002.
>             sequence updated: Jun 1, 2001.
>             annotation updated: May 16, 2006.
>             xrefs: AF331517.1, AAK16157.1
>             xrefs (non-sequence databases): UniGene:Mm.388865,
>             Ensembl:ENSMUSG00000049699, MGI:2176375, GO:0005576, GO: 
> 0001664,
>             GO:0006171, GO:0007586, GO:0006950
>
> How is this parsed? Could anybody point me into the good direction?
>
> Regards,
> Gerben
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From gad14 at cornell.edu  Tue May 23 19:38:31 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Tue, 23 May 2006 15:38:31 -0400
Subject: [BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with
	biosql
In-Reply-To: <D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
Message-ID: <447364B7.4030203@cornell.edu>

Hi Hilmar,

I apologize in advance if I'm talking about something that is well 
documented somewhere but I'm still having trouble understanding exactly 
what I need to do to get a biosql database loaded in such a way so that 
it can interact fully with gbrowse - it seems to be half way there.

I use load_seqdatabase.pl to load the genome sequence (single sequence 
in fasta format) into a biosql database but I populate the db with 
features using what I thought was a GFF-centric approach, not with 
load_seqdatabase.pl -- see my code in #4 below.

Here is exactly what I do:

1) Create a mysql database called 'test_biosql' with correct permissions
2) Load the biosql schema:

	mysql --user=xx --password=xx test_biosql < 
/usr/local/biosql-schema/sql/biosqldb-mysql.sql

3) Use load_seqdatabase.pl to load the single genomic dna sequence:

	load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 
-namespace NC_004578 -format fasta 6853.fasta

4) I then use a script I wrote to load the SeqFeatures which are in gff 
format in a file i pass in as as arg ($in). Here is the code:


# read gff file into gff io object
my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3);

# create a Bio::DB::DBAdaptorI implementing object
my $db = Bio::DB::BioDB->new(-database   => $dbname,
                              -port       => $port,
                              -dbname     => $database,
                              -driver     => $driver,
                              -user       => $user,
                              -pass       => $pass,
                              );

# get appropriate object adaptor
my $adp = $db->get_object_adaptor("Bio::SeqI");

my $acc = "NC_004578"; # the genome seq id already in the db
my $seq = Bio::Seq->new(-accession_number => $acc,
			-display_id => $acc,
			-primary_id => $acc,
                         -namespace => $acc);

# Locate entry matching the unique key attributes and populate a
# persistent object with this entry.
my $dbseq = $adp->find_by_unique_key($seq);

# insert features from gff file into database.
while (my $feat = $gffio->next_feature()) {
   $dbseq->add_SeqFeature($feat);
   $dbseq->store;
   $dbseq->commit();
}


Is there additional code I should have here? I realize you're not a 
expert/user of gbrowse.. and this problem seems to be related to the 
gbrowse_details cgi script, which you probably are not familiar with. 
But I'm CC'ing the lists in case anyone else has some clues. I do 
appreciate any insight you might have though. It would be good to know 
if I'm doing all that I need to do to fully and correctly populate a 
biosql db with GFF/SeqFeature.

Thanks,
Genevieve


Hilmar Lapp wrote:
> Hi Genevieve,
> 
> there's a couple more regular users of BioSQL than one (about 25-30  
> groups), but not many who run GBrowse off of BioSQL (and I don't  count 
> among those - yet).
> 
> Of those who have posted before that they accomplished this, I  believe 
> none were using load_seqdatabase.pl to load the data.  Instead, they 
> loaded data through the DBGFF adaptor for BioSQL, i.e.,  like you would 
> load data into a GFF database, just using a different  adaptor.
> 
> load_seqdatabase.pl will load through the sequence-centric Bioperl  
> object model, and has no notion of GFF or GFF3 and associated  
> constraints (controlled vocabulary for feature type and source terms,  
> location types etc).
> 
> It is probably possible to load data through load_seqdatabase.pl and  
> then render it through GBrowse but doing so will almost certainly  
> require a SeqProcessor (see --pipeline argument to  load_seqdatabase.pl) 
> to be written that will appropriately unflatten  the feature array and 
> in fact probably have to use  SeqFeature::Annotated (where actually did 
> SeqFeature::TypedSeqFeature  go?). In parallel, bioperl-db will need to 
> be fixed to be prepared  for SeqFeatureI implementations that use 
> ontology terms for  primary_tag and source_tag instead of strings.
> 
> Is it possible for you to load your data through a GFF3 intermediary?  
> Bioperl has modules and in fact scripts that will write GFF3 (if I'm  
> not mistaken ...).
> 
>     -hilmar
> 
> On May 18, 2006, at 5:58 PM, Lincoln Stein wrote:
> 
>> Hi Genevieve,
>>
>> The problem is that none of us really knows anything about BioSQL.  
>> Hilmar is
>> the only regular user of this database. He's now gone to NESCent (duke
>> university) and may not be receiving mail sent to GNF.
>>
>> Lincoln
>>
>> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote:
>>
>>> Hi Scott,
>>>
>>> I'm still having the same problem. It might have to do with how the
>>> BioSQL database is populated. I use the load_seqdatabase.pl script to
>>> load the database along with bioperl-db functions for loading
>>> SeqFeatures directly. I took a closer look at how the tables are
>>> populated in the biosql tables. (If you're not familiar with  BioSQL the
>>> following may not be familiar to you -- i just want to put this
>>> observation out there...). I noticed that the 'term_id' field in the
>>> Location table was empty for the first gene record i had loaded.  When I
>>> set term_id to be '11', the id that corresponds with the 'gene'  
>>> ontology
>>> term, i notice a positive change in what's displayed on the
>>> gbrowse_details page for this record... the name of the gene  'dnaA' now
>>> appears in the title line in large blue font, as it should. The class
>>> name is still missing, as does all the detail about this gene -
>>> coordinates, etc.
>>>
>>> Lincoln suggests that I talk directly Hilmar Lapp who is the main  
>>> BioSQL
>>> developer. It could be that I am bumping up against things that  haven't
>>> been developed yet as far as the GBrowse<->BioSQL db connectivity  goes.
>>> I've been taking a closer look at gbrowse_details.pl, Browser.pm and
>>> Util.pm in order to try to understand where the disconnect might  be...
>>>
>>> To answer your question below.. yes GBrowse works fine for the
>>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table  database. 
>>> I'm
>>> using this installation of GBrowse 1.64 for several MySQL  databases 
>>> with
>>> the default gbrowse tables... everything is working fine. My only
>>> trouble with gbrowse crops up when interfacing with the biosql  mysql 
>>> db.
>>>
>>> Thanks for all your help,
>>> Genevieve
>>>
>>> Scott Cain wrote:
>>>
>>>> Hi Genevieve,
>>>>
>>>> I'm sorry this has hung out there unanswered for so long.  I  
>>>> suppose it
>>>> was because I chose not to answer it because it involved BioSQL  
>>>> (which I
>>>> know just about nothing about) and Simon seemed to think that the  
>>>> MySQL
>>>> adaptor was involved somehow (though it doesn't look to me like  it 
>>>> is).
>>>>
>>>> Anyway, I'll try to get started answering your questions  (assuming you
>>>> haven't already puzzled you way to one already).  See my comments  
>>>> below.
>>>>
>>>> Scott
>>>>
>>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote:
>>>>
>>>>> Hi,
>>>>> I'm running gbrowse 1.64 with a biosql database on a mac with  bioperl
>>>>> 1.5.1. I successfully loaded the database with  load_seqdatabase.pl 
>>>>> with
>>>>> NC_004578.gbk from NCBI.
>>>>>
>>>>> The features display as they should on the main gbrowse details  pane.
>>>>> However, when I click on one of the features I get GBrowse  Details 
>>>>> data
>>>>> record page with ":Details" at the top in large blue font but no  data
>>>>> for that gene display. In smaller red font, "Requested feature  not 
>>>>> found
>>>>> in database" which is followed by the normal details page footer  info
>>>>> ("For the source code for this browser, see...", etc).
>>>>>
>>>>> I'm using the 06.biosql.conf file - with appropriate additions in
>>>>> db_args for my database. I changed 'link' to
>>>>>
>>>>>     link = AUTO
>>>>>
>>>>> from what was there
>>>>>
>>>>>     link = 
>>>>> http://localhost/perl/gbrowse?ref=$ref;start=$start;stop= $end
>>>>>
>>>>> The default suggestion for 'link' is a little confusing.. why  does it
>>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why isn't
>>>>> 'cgi-bin' in the path?
>>>>
>>>>
>>>> I'm not sure why the suggested link is the way it is; perhaps that
>>>> config file predates the gbrowse_details script and no one  changed 
>>>> this
>>>> sample config file.  I changed it and it will be changed in the next
>>>> release.
>>>>
>>>> As for the path, 'perl' is a common url convention for scripts  that 
>>>> are
>>>> running under mod_perl, so I suspect the person who wrote this  sample
>>>> config file was running mod_perl.
>>>>
>>>>> When i set link to 'AUTO' I at least get the details page.
>>>>> gbrowse_details is not getting what it needs to disaply the  record 
>>>>> info
>>>>> though. The webserver error I get is:
>>>>>
>>>>> Subroutine Bio::SeqFeature::Generic::type redefined at
>>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/ BioSQL.pm
>>>>> line 126.
>>>>
>>>>
>>>> I'm not sure this is really the problem.  Let me make sure: it is  
>>>> after
>>>> you changed link to AUTO that you see this?  That is, the page  you see
>>>> now is as you described in your second paragraph, right?   
>>>> Unfortunately,
>>>> this is the part where I become particularly useless, since I  don't 
>>>> know
>>>> anything about BioSQL.  Is the details page working OK for the
>>>> yeast_chr1 dataset?
>>>>
>>>> Scott
>>>>
>>>>> I took a look at line 126 in BioSQL.pm - not sure what to make  of it.
>>>>>
>>>>> Any ideas? Am I overlooking anything?
>>>>>
>>>>> Thanks,
>>>>> Genevieve
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------
>>>>> Using Tomcat but need to do more? Need to support web services,  
>>>>> security?
>>>>> Get stuff done quickly with pre-integrated technology to make  your 
>>>>> job
>>>>> easier Download IBM WebSphere Application Server v.1.0.1 based  on 
>>>>> Apache
>>>>> Geronimo
>>>>> http://sel.as-us.falkag.net/sel? 
>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>> _______________________________________________
>>>>> Gmod-gbrowse mailing list
>>>>> Gmod-gbrowse at lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>>
>> -- 
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>>
>> -------------------------------------------------------
>> Using Tomcat but need to do more? Need to support web services,  
>> security?
>> Get stuff done quickly with pre-integrated technology to make your  
>> job easier
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache  
>> Geronimo
>> http://sel.as-us.falkag.net/sel? cmd=lnk&kid=120709&bid=263057&dat=121642
>> _______________________________________________
>> Gmod-gbrowse mailing list
>> Gmod-gbrowse at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
> 


From hlapp at gmx.net  Wed May 24 03:50:21 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 24 May 2006 04:50:21 +0100
Subject: [BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with
	biosql
In-Reply-To: <447364B7.4030203@cornell.edu>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
	<447364B7.4030203@cornell.edu>
Message-ID: <CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>

Hi Genevieve & Scott, see below for interspersed comments.

On May 23, 2006, at 3:38 PM, Genevieve DeClerck wrote:

> Hi Hilmar,
>
> I apologize in advance if I'm talking about something that is well  
> documented somewhere
> but I'm still having trouble understanding exactly what I need to  
> do to get a biosql database loaded in such a way so that it can  
> interact fully with gbrowse - it seems to be half way there.
>
> I use load_seqdatabase.pl to load the genome sequence (single  
> sequence in fasta format) into a biosql database but I populate the  
> db with features using what I thought was a GFF-centric approach,  
> not with load_seqdatabase.pl -- see my code in #4 below.
>
> Here is exactly what I do:
>
> [...]
> 3) Use load_seqdatabase.pl to load the single genomic dna sequence:
>
> 	load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 - 
> namespace NC_004578 -format fasta 6853.fasta
>
> 4) I then use a script I wrote to load the SeqFeatures which are in  
> gff format in a file i pass in as as arg ($in). Here is the code:
>
>
> # read gff file into gff io object
> my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3);
>
> # create a Bio::DB::DBAdaptorI implementing object
> my $db = Bio::DB::BioDB->new(-database   => $dbname,
>                              -port       => $port,
>                              -dbname     => $database,
>                              -driver     => $driver,
>                              -user       => $user,
>                              -pass       => $pass,
>                              );
>
> # get appropriate object adaptor
> my $adp = $db->get_object_adaptor("Bio::SeqI");
>
> my $acc = "NC_004578"; # the genome seq id already in the db
> my $seq = Bio::Seq->new(-accession_number => $acc,
> 			-display_id => $acc,
> 			-primary_id => $acc,
>                         -namespace => $acc);
>
> # Locate entry matching the unique key attributes and populate a
> # persistent object with this entry.
> my $dbseq = $adp->find_by_unique_key($seq);
>
> # insert features from gff file into database.
> while (my $feat = $gffio->next_feature()) {
>   $dbseq->add_SeqFeature($feat);
>   $dbseq->store;
>   $dbseq->commit();
> }
>
>
> Is there additional code I should have here? I realize you're not a  
> expert/user of gbrowse.. and this problem seems to be related to  
> the gbrowse_details cgi script, which you probably are not familiar  
> with.

So your use case is that you have a sequence in simple fasta format  
with its annotation in another file in GFF3 format, and you want to  
load both into a Biosql database and visualize in GBrowse.

It looks like I was in fact on the wrong path the whole time. The  
Gbrowse Biosql adaptor that I can find is a Bio::DasI adaptor through  
which you cannot load but only retrieve, so I have to assume that you  
were right in using load_seqdatabase.pl. Can somebody help out here  
who has been using Biosql as the underlying database and confirm or  
set me straight?

If that is the procedure then your code looks alright. Also, it looks  
like Bio::Tools::GFF does not return hierarchical feature graphs for  
v3 input (which bioperl-db wouldn't handle properly because it  
doesn't support the feature_relationship table yet).

So, I'm in fact at a loss explaining why the details page doesn't  
work for you, given that people reported it to work before. I'm  
inclined to claim that the respective Gbrowse code has changed,  
either in the way it expects the feature to be set up, or in the way  
it uses the DasI interface, and broke the Biosql adaptor. Can  
somebody (Scott? Lincoln?) comment on whether there were any changes  
in this regard?

The lines in gbrowse_detail that look like lead to the problem is

my @features = sort {$b->length<=>$a->length} $CONFIG->_feature_get 
($db,$name,$class);
@features    = sort {$b->length<=>$a->length} $CONFIG->_feature_get 
($db,$ref,$class,$start,$end,1)
   unless @features;

neither of which returns any matches.

I have no clue yet how those two calls get translated into DasI to  
bioperl-db queries.

	-hilmar

> But I'm CC'ing the lists in case anyone else has some clues. I do  
> appreciate any insight you might have though. It would be good to  
> know if I'm doing all that I need to do to fully and correctly  
> populate a biosql db with GFF/SeqFeature.
>
> Thanks,
> Genevieve
>
>
>
> Hilmar Lapp wrote:
>> Hi Genevieve,
>> there's a couple more regular users of BioSQL than one (about  
>> 25-30  groups), but not many who run GBrowse off of BioSQL (and I  
>> don't  count among those - yet).
>> Of those who have posted before that they accomplished this, I   
>> believe none were using load_seqdatabase.pl to load the data.   
>> Instead, they loaded data through the DBGFF adaptor for BioSQL,  
>> i.e.,  like you would load data into a GFF database, just using a  
>> different  adaptor.
>> load_seqdatabase.pl will load through the sequence-centric  
>> Bioperl  object model, and has no notion of GFF or GFF3 and  
>> associated  constraints (controlled vocabulary for feature type  
>> and source terms,  location types etc).
>> It is probably possible to load data through load_seqdatabase.pl  
>> and  then render it through GBrowse but doing so will almost  
>> certainly  require a SeqProcessor (see --pipeline argument to   
>> load_seqdatabase.pl) to be written that will appropriately  
>> unflatten  the feature array and in fact probably have to use   
>> SeqFeature::Annotated (where actually did  
>> SeqFeature::TypedSeqFeature  go?). In parallel, bioperl-db will  
>> need to be fixed to be prepared  for SeqFeatureI implementations  
>> that use ontology terms for  primary_tag and source_tag instead of  
>> strings.
>> Is it possible for you to load your data through a GFF3  
>> intermediary?  Bioperl has modules and in fact scripts that will  
>> write GFF3 (if I'm  not mistaken ...).
>>     -hilmar
>> On May 18, 2006, at 5:58 PM, Lincoln Stein wrote:
>>> Hi Genevieve,
>>>
>>> The problem is that none of us really knows anything about  
>>> BioSQL.  Hilmar is
>>> the only regular user of this database. He's now gone to NESCent  
>>> (duke
>>> university) and may not be receiving mail sent to GNF.
>>>
>>> Lincoln
>>>
>>> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote:
>>>
>>>> Hi Scott,
>>>>
>>>> I'm still having the same problem. It might have to do with how the
>>>> BioSQL database is populated. I use the load_seqdatabase.pl  
>>>> script to
>>>> load the database along with bioperl-db functions for loading
>>>> SeqFeatures directly. I took a closer look at how the tables are
>>>> populated in the biosql tables. (If you're not familiar with   
>>>> BioSQL the
>>>> following may not be familiar to you -- i just want to put this
>>>> observation out there...). I noticed that the 'term_id' field in  
>>>> the
>>>> Location table was empty for the first gene record i had  
>>>> loaded.  When I
>>>> set term_id to be '11', the id that corresponds with the 'gene'   
>>>> ontology
>>>> term, i notice a positive change in what's displayed on the
>>>> gbrowse_details page for this record... the name of the gene   
>>>> 'dnaA' now
>>>> appears in the title line in large blue font, as it should. The  
>>>> class
>>>> name is still missing, as does all the detail about this gene -
>>>> coordinates, etc.
>>>>
>>>> Lincoln suggests that I talk directly Hilmar Lapp who is the  
>>>> main  BioSQL
>>>> developer. It could be that I am bumping up against things that   
>>>> haven't
>>>> been developed yet as far as the GBrowse<->BioSQL db  
>>>> connectivity  goes.
>>>> I've been taking a closer look at gbrowse_details.pl, Browser.pm  
>>>> and
>>>> Util.pm in order to try to understand where the disconnect  
>>>> might  be...
>>>>
>>>> To answer your question below.. yes GBrowse works fine for the
>>>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table   
>>>> database. I'm
>>>> using this installation of GBrowse 1.64 for several MySQL   
>>>> databases with
>>>> the default gbrowse tables... everything is working fine. My only
>>>> trouble with gbrowse crops up when interfacing with the biosql   
>>>> mysql db.
>>>>
>>>> Thanks for all your help,
>>>> Genevieve
>>>>
>>>> Scott Cain wrote:
>>>>
>>>>> Hi Genevieve,
>>>>>
>>>>> I'm sorry this has hung out there unanswered for so long.  I   
>>>>> suppose it
>>>>> was because I chose not to answer it because it involved  
>>>>> BioSQL  (which I
>>>>> know just about nothing about) and Simon seemed to think that  
>>>>> the  MySQL
>>>>> adaptor was involved somehow (though it doesn't look to me  
>>>>> like  it is).
>>>>>
>>>>> Anyway, I'll try to get started answering your questions   
>>>>> (assuming you
>>>>> haven't already puzzled you way to one already).  See my  
>>>>> comments  below.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote:
>>>>>
>>>>>> Hi,
>>>>>> I'm running gbrowse 1.64 with a biosql database on a mac with   
>>>>>> bioperl
>>>>>> 1.5.1. I successfully loaded the database with   
>>>>>> load_seqdatabase.pl with
>>>>>> NC_004578.gbk from NCBI.
>>>>>>
>>>>>> The features display as they should on the main gbrowse  
>>>>>> details  pane.
>>>>>> However, when I click on one of the features I get GBrowse   
>>>>>> Details data
>>>>>> record page with ":Details" at the top in large blue font but  
>>>>>> no  data
>>>>>> for that gene display. In smaller red font, "Requested  
>>>>>> feature  not found
>>>>>> in database" which is followed by the normal details page  
>>>>>> footer  info
>>>>>> ("For the source code for this browser, see...", etc).
>>>>>>
>>>>>> I'm using the 06.biosql.conf file - with appropriate additions in
>>>>>> db_args for my database. I changed 'link' to
>>>>>>
>>>>>>     link = AUTO
>>>>>>
>>>>>> from what was there
>>>>>>
>>>>>>     link = http://localhost/perl/gbrowse?ref=$ref;start= 
>>>>>> $start;stop= $end
>>>>>>
>>>>>> The default suggestion for 'link' is a little confusing.. why   
>>>>>> does it
>>>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why  
>>>>>> isn't
>>>>>> 'cgi-bin' in the path?
>>>>>
>>>>>
>>>>> I'm not sure why the suggested link is the way it is; perhaps that
>>>>> config file predates the gbrowse_details script and no one   
>>>>> changed this
>>>>> sample config file.  I changed it and it will be changed in the  
>>>>> next
>>>>> release.
>>>>>
>>>>> As for the path, 'perl' is a common url convention for scripts   
>>>>> that are
>>>>> running under mod_perl, so I suspect the person who wrote this   
>>>>> sample
>>>>> config file was running mod_perl.
>>>>>
>>>>>> When i set link to 'AUTO' I at least get the details page.
>>>>>> gbrowse_details is not getting what it needs to disaply the   
>>>>>> record info
>>>>>> though. The webserver error I get is:
>>>>>>
>>>>>> Subroutine Bio::SeqFeature::Generic::type redefined at
>>>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/  
>>>>>> BioSQL.pm
>>>>>> line 126.
>>>>>
>>>>>
>>>>> I'm not sure this is really the problem.  Let me make sure: it  
>>>>> is  after
>>>>> you changed link to AUTO that you see this?  That is, the page   
>>>>> you see
>>>>> now is as you described in your second paragraph, right?    
>>>>> Unfortunately,
>>>>> this is the part where I become particularly useless, since I   
>>>>> don't know
>>>>> anything about BioSQL.  Is the details page working OK for the
>>>>> yeast_chr1 dataset?
>>>>>
>>>>> Scott
>>>>>
>>>>>> I took a look at line 126 in BioSQL.pm - not sure what to  
>>>>>> make  of it.
>>>>>>
>>>>>> Any ideas? Am I overlooking anything?
>>>>>>
>>>>>> Thanks,
>>>>>> Genevieve
>>>>>>
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------
>>>>>> Using Tomcat but need to do more? Need to support web  
>>>>>> services,  security?
>>>>>> Get stuff done quickly with pre-integrated technology to make   
>>>>>> your job
>>>>>> easier Download IBM WebSphere Application Server v.1.0.1  
>>>>>> based  on Apache
>>>>>> Geronimo
>>>>>> http://sel.as-us.falkag.net/sel?  
>>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>>> _______________________________________________
>>>>>> Gmod-gbrowse mailing list
>>>>>> Gmod-gbrowse at lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>> -- 
>>> Lincoln D. Stein
>>> Cold Spring Harbor Laboratory
>>> 1 Bungtown Road
>>> Cold Spring Harbor, NY 11724
>>> (516) 367-8380 (voice)
>>> (516) 367-8389 (fax)
>>> FOR URGENT MESSAGES & SCHEDULING,
>>> PLEASE CONTACT MY ASSISTANT,
>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>>
>>>
>>> -------------------------------------------------------
>>> Using Tomcat but need to do more? Need to support web services,   
>>> security?
>>> Get stuff done quickly with pre-integrated technology to make  
>>> your  job easier
>>> Download IBM WebSphere Application Server v.1.0.1 based on  
>>> Apache  Geronimo
>>> http://sel.as-us.falkag.net/sel?  
>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>
>
>
> -------------------------------------------------------
> All the advantages of Linux Managed Hosting--Without the Cost and  
> Risk!
> Fully trained technicians. The highest number of Red Hat  
> certifications in
> the hosting industry. Fanatical Support. Click to learn more
> http://sel.as-us.falkag.net/sel? 
> cmd=lnk&kid=107521&bid=248729&dat=121642
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mjcipriano at lbl.gov  Tue May 30 22:26:24 2006
From: mjcipriano at lbl.gov (Michael Cipriano)
Date: Tue, 30 May 2006 15:26:24 -0700
Subject: [BioSQL-l] Problem with add feature under BioSQL
In-Reply-To: <CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
	<447364B7.4030203@cornell.edu>
	<CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>
Message-ID: <1149027984.3139.105.camel@alien>

Hello,

I have found a problem with adding features via gbrowse_img with the
add=xxx tag. I am using the CVS version of GGB, bioperl-live and BioSQL
schema on mysql.

When using the add=xxx tag, it will produce a fatal error (with
BioSQL/Das interface). There error shown in the error log is:
link:
/cgi-bin/gbrowse_img/bacteria/?name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;

ERROR from apache error_log:
[Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't locate
object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
at /var/www/cgi-bin/gbrowse_img line 502.
[Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Premature end
of script headers: gbrowse_img

This is from this section of code at line ~506:

    unless ($segments{$refname}) {
      my @segments = map {
        eval{$_->absolute(0)}; $_  # so that rel2abs works properly
later
      }
        grep { $current_segment->overlaps($_) } get_segments($db,
$refname);
      return unless @segments;
      $segments{$refname} = $segments[0];
    }


The overlaps function is not defined in the
Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.

The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
below) in the file Bio/DB/Das/BioSQL/Segment.pm


#@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
@ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);


I am not sure if this will have any other consequences other then fixing
the bug I mentioned (and possibly fixing something else).

Can anyone tell me if this will introduce any new bugs, and if not, can
someone commit this change.

Thanks,
Michael Cipriano
Developer - LBNL


From hlapp at gmx.net  Wed May 31 18:33:40 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:33:40 -0400
Subject: [BioSQL-l] Problem with add feature under BioSQL
In-Reply-To: <1149027984.3139.105.camel@alien>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
	<447364B7.4030203@cornell.edu>
	<CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>
	<1149027984.3139.105.camel@alien>
Message-ID: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>

This should be a Gbrowse problem, not in Biosql or bioperl-db unless  
I'm missing something? Just trying to make sure ...

	-hilmar

On May 30, 2006, at 6:26 PM, Michael Cipriano wrote:

> Hello,
>
> I have found a problem with adding features via gbrowse_img with the
> add=xxx tag. I am using the CVS version of GGB, bioperl-live and  
> BioSQL
> schema on mysql.
>
> When using the add=xxx tag, it will produce a fatal error (with
> BioSQL/Das interface). There error shown in the error log is:
> link:
> /cgi-bin/gbrowse_img/bacteria/? 
> name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;
>
> ERROR from apache error_log:
> [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't  
> locate
> object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
> at /var/www/cgi-bin/gbrowse_img line 502.
> [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104]  
> Premature end
> of script headers: gbrowse_img
>
> This is from this section of code at line ~506:
>
>     unless ($segments{$refname}) {
>       my @segments = map {
>         eval{$_->absolute(0)}; $_  # so that rel2abs works properly
> later
>       }
>         grep { $current_segment->overlaps($_) } get_segments($db,
> $refname);
>       return unless @segments;
>       $segments{$refname} = $segments[0];
>     }
>
>
> The overlaps function is not defined in the
> Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.
>
> The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
> below) in the file Bio/DB/Das/BioSQL/Segment.pm
>
>
> #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
> @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);
>
>
> I am not sure if this will have any other consequences other then  
> fixing
> the bug I mentioned (and possibly fixing something else).
>
> Can anyone tell me if this will introduce any new bugs, and if not,  
> can
> someone commit this change.
>
> Thanks,
> Michael Cipriano
> Developer - LBNL
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mjcipriano at lbl.gov  Wed May 31 18:50:53 2006
From: mjcipriano at lbl.gov (Michael Cipriano)
Date: Wed, 31 May 2006 11:50:53 -0700
Subject: [BioSQL-l] Problem with add feature under BioSQL
In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
References: <4447A8D5.4@cornell.edu>
	<1147896787.2600.47.camel@localhost.localdomain>
	<446CE4A3.6070004@cornell.edu> <200605181758.54965.lstein@cshl.edu>
	<D3D24947-79DA-443C-87C2-CA0AD3895F48@gmx.net>
	<447364B7.4030203@cornell.edu>
	<CC11E88F-5091-4D93-9FDF-9CE1BF916124@gmx.net>
	<1149027984.3139.105.camel@alien>
	<717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
Message-ID: <1149101453.3139.112.camel@alien>

Hi,

Yes, I only see the problem with gbrowse, though it could come up
anytime someone wants connect with Das using Bio::DB::Das::BioSQL and
needs the overlap function (or other range functions) in the Segment
object.

-Michael

On Wed, 2006-05-31 at 14:33 -0400, Hilmar Lapp wrote:
> This should be a Gbrowse problem, not in Biosql or bioperl-db unless  
> I'm missing something? Just trying to make sure ...
> 
> 	-hilmar
> 
> On May 30, 2006, at 6:26 PM, Michael Cipriano wrote:
> 
> > Hello,
> >
> > I have found a problem with adding features via gbrowse_img with the
> > add=xxx tag. I am using the CVS version of GGB, bioperl-live and  
> > BioSQL
> > schema on mysql.
> >
> > When using the add=xxx tag, it will produce a fatal error (with
> > BioSQL/Das interface). There error shown in the error log is:
> > link:
> > /cgi-bin/gbrowse_img/bacteria/? 
> > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;
> >
> > ERROR from apache error_log:
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't  
> > locate
> > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
> > at /var/www/cgi-bin/gbrowse_img line 502.
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104]  
> > Premature end
> > of script headers: gbrowse_img
> >
> > This is from this section of code at line ~506:
> >
> >     unless ($segments{$refname}) {
> >       my @segments = map {
> >         eval{$_->absolute(0)}; $_  # so that rel2abs works properly
> > later
> >       }
> >         grep { $current_segment->overlaps($_) } get_segments($db,
> > $refname);
> >       return unless @segments;
> >       $segments{$refname} = $segments[0];
> >     }
> >
> >
> > The overlaps function is not defined in the
> > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.
> >
> > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
> > below) in the file Bio/DB/Das/BioSQL/Segment.pm
> >
> >
> > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
> > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);
> >
> >
> > I am not sure if this will have any other consequences other then  
> > fixing
> > the bug I mentioned (and possibly fixing something else).
> >
> > Can anyone tell me if this will introduce any new bugs, and if not,  
> > can
> > someone commit this change.
> >
> > Thanks,
> > Michael Cipriano
> > Developer - LBNL
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l
> >
> 


From lstein at cshl.edu  Wed May 31 18:47:47 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 14:47:47 -0400
Subject: [BioSQL-l] [Gmod-gbrowse] Re: Problem with add feature under
	BioSQL
In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
References: <4447A8D5.4@cornell.edu> <1149027984.3139.105.camel@alien>
	<717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
Message-ID: <200605311447.49454.lstein@cshl.edu>

I think this is a problem in the Bio::DB::Das::BioSQL::Segment module. I will 
add an overlaps() method.

Lincoln

On Wednesday 31 May 2006 14:33, Hilmar Lapp wrote:
> This should be a Gbrowse problem, not in Biosql or bioperl-db unless
> I'm missing something? Just trying to make sure ...
>
> 	-hilmar
>
> On May 30, 2006, at 6:26 PM, Michael Cipriano wrote:
> > Hello,
> >
> > I have found a problem with adding features via gbrowse_img with the
> > add=xxx tag. I am using the CVS version of GGB, bioperl-live and
> > BioSQL
> > schema on mysql.
> >
> > When using the add=xxx tag, it will produce a fatal error (with
> > BioSQL/Das interface). There error shown in the error log is:
> > link:
> > /cgi-bin/gbrowse_img/bacteria/?
> > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;
> >
> > ERROR from apache error_log:
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't
> > locate
> > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
> > at /var/www/cgi-bin/gbrowse_img line 502.
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104]
> > Premature end
> > of script headers: gbrowse_img
> >
> > This is from this section of code at line ~506:
> >
> >     unless ($segments{$refname}) {
> >       my @segments = map {
> >         eval{$_->absolute(0)}; $_  # so that rel2abs works properly
> > later
> >       }
> >         grep { $current_segment->overlaps($_) } get_segments($db,
> > $refname);
> >       return unless @segments;
> >       $segments{$refname} = $segments[0];
> >     }
> >
> >
> > The overlaps function is not defined in the
> > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.
> >
> > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
> > below) in the file Bio/DB/Das/BioSQL/Segment.pm
> >
> >
> > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
> > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);
> >
> >
> > I am not sure if this will have any other consequences other then
> > fixing
> > the bug I mentioned (and possibly fixing something else).
> >
> > Can anyone tell me if this will introduce any new bugs, and if not,
> > can
> > someone commit this change.
> >
> > Thanks,
> > Michael Cipriano
> > Developer - LBNL
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Wed May 31 18:56:30 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 14:56:30 -0400
Subject: [BioSQL-l] [Gmod-gbrowse] Re: Problem with add feature under
	BioSQL
In-Reply-To: <717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
References: <4447A8D5.4@cornell.edu> <1149027984.3139.105.camel@alien>
	<717A539D-0B2E-4EB7-A74D-3FE0B4D1F146@gmx.net>
Message-ID: <200605311456.32088.lstein@cshl.edu>

I've just committed the fix into CVS. This will be available in the upcoming 
gbrowse release as well.

Lincoln

On Wednesday 31 May 2006 14:33, Hilmar Lapp wrote:
> This should be a Gbrowse problem, not in Biosql or bioperl-db unless
> I'm missing something? Just trying to make sure ...
>
> 	-hilmar
>
> On May 30, 2006, at 6:26 PM, Michael Cipriano wrote:
> > Hello,
> >
> > I have found a problem with adding features via gbrowse_img with the
> > add=xxx tag. I am using the CVS version of GGB, bioperl-live and
> > BioSQL
> > schema on mysql.
> >
> > When using the add=xxx tag, it will produce a fatal error (with
> > BioSQL/Das interface). There error shown in the error log is:
> > link:
> > /cgi-bin/gbrowse_img/bacteria/?
> > name=NC_000964:1..2000;width=600;type=CDS;add=NC_000964+myhit+9..999;
> >
> > ERROR from apache error_log:
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104] Can't
> > locate
> > object method "overlaps" via package "Bio::DB::Das::BioSQL::Segment"
> > at /var/www/cgi-bin/gbrowse_img line 502.
> > [Tue May 30 12:17:16 2006] [error] [client 131.243.56.104]
> > Premature end
> > of script headers: gbrowse_img
> >
> > This is from this section of code at line ~506:
> >
> >     unless ($segments{$refname}) {
> >       my @segments = map {
> >         eval{$_->absolute(0)}; $_  # so that rel2abs works properly
> > later
> >       }
> >         grep { $current_segment->overlaps($_) } get_segments($db,
> > $refname);
> >       return unless @segments;
> >       $segments{$refname} = $segments[0];
> >     }
> >
> >
> > The overlaps function is not defined in the
> > Bio::DB::Das::BioSQL::Segment or any of the objects it inherits.
> >
> > The fix was the inclusion of Bio::RangeI in the @ISA variable (shown
> > below) in the file Bio/DB/Das/BioSQL/Segment.pm
> >
> >
> > #@ISA = qw(Bio::Root::Root Bio::SeqI Bio::Das::SegmentI); #OLD BROKEN
> > @ISA = qw(Bio::Root::Root Bio::RangeI Bio::SeqI Bio::Das::SegmentI);
> >
> >
> > I am not sure if this will have any other consequences other then
> > fixing
> > the bug I mentioned (and possibly fixing something else).
> >
> > Can anyone tell me if this will introduce any new bugs, and if not,
> > can
> > someone commit this change.
> >
> > Thanks,
> > Michael Cipriano
> > Developer - LBNL
> >
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From d49228002 at ym.edu.tw  Wed May 31 14:39:16 2006
From: d49228002 at ym.edu.tw (Yi-Feng Chang)
Date: Wed, 31 May 2006 22:39:16 +0800
Subject: [BioSQL-l] Error loading ontology terms
Message-ID: <000001c68c97$f461ac70$6801a8c0@iannb>

Dear All,
I've checked biosql archives, and found a similar thread (http://lists.open-bio.org/pipermail/biojava-l/2005-November/005151.html) however, it did not give specific solution. So I post here again, and hope there are someone could help me.
I'm using JDK1.5.0_05, Biojava 1.4, Biosql 1.41, and Mysql 5.0 with My_connectJ 3.1

I was following the demo source that provide by biojava-in-anger except for the database connection 
the exceptions were listed in following:

In first connection there would be a connection error

*** Importing a core ontology -- hope this is okay
*** Importing terms
Exception in thread "main" org.biojava.bio.BioException: Error connecting to BioSQL database: Connection is closed.
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:276)
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.<init>(BioSQLSequenceDB.java:194)
 at genevote.BioSQLTest.loadSeq(BioSQLTest.java:31)
 at genevote.BioSQLTest.main(BioSQLTest.java:70)
Caused by: java.sql.SQLException: Connection is closed.
 at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.checkOpen(PoolingDataSource.java:219)
 at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.createStatement(PoolingDataSource.java:248)
 at org.biojava.bio.seq.db.biosql.MySQLDBHelper.getInsertID(MySQLDBHelper.java:68)
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:268)
 ... 3 more

Then I tried again, it works, and I put all sequences in genbank format into biosql db without error.
But, while I tried to extract sequences, exception comes again.

org.biojava.bio.BioException: Error loading ontology terms
 at org.biojava.bio.seq.db.biosql.OntologySQL.loadOntology(OntologySQL.java:444)
 at org.biojava.bio.seq.db.biosql.OntologySQL.getOntology(OntologySQL.java:116)
 at org.biojava.bio.seq.db.biosql.OntologySQL.<init>(OntologySQL.java:413)
 at org.biojava.bio.seq.db.biosql.OntologySQL.getOntologySQL(OntologySQL.java:72)
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb(BioSQLSequenceDB.java:240)
 at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.<init>(BioSQLSequenceDB.java:194)
 at genevote.test.loadSeq(test.java:25)
 at genevote.test.main(test.java:76)
Caused by: java.sql.SQLException: Unknown column 'name' in 'field list'
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2851)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1534)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1625)
 at com.mysql.jdbc.Connection.execSQL(Connection.java:2297)
 at com.mysql.jdbc.Connection.execSQL(Connection.java:2226)
 at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1812)
 at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1657)
 at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
 at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205)
 at org.biojava.bio.seq.db.biosql.OntologySQL.loadTerms(OntologySQL.java:339)
 at org.biojava.bio.seq.db.biosql.OntologySQL.loadOntology(OntologySQL.java:441)
 ... 7 more