[BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with biosql

Genevieve DeClerck gad14 at cornell.edu
Tue May 23 15:38:31 EDT 2006


Hi Hilmar,

I apologize in advance if I'm talking about something that is well 
documented somewhere but I'm still having trouble understanding exactly 
what I need to do to get a biosql database loaded in such a way so that 
it can interact fully with gbrowse - it seems to be half way there.

I use load_seqdatabase.pl to load the genome sequence (single sequence 
in fasta format) into a biosql database but I populate the db with 
features using what I thought was a GFF-centric approach, not with 
load_seqdatabase.pl -- see my code in #4 below.

Here is exactly what I do:

1) Create a mysql database called 'test_biosql' with correct permissions
2) Load the biosql schema:

	mysql --user=xx --password=xx test_biosql < 
/usr/local/biosql-schema/sql/biosqldb-mysql.sql

3) Use load_seqdatabase.pl to load the single genomic dna sequence:

	load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 
-namespace NC_004578 -format fasta 6853.fasta

4) I then use a script I wrote to load the SeqFeatures which are in gff 
format in a file i pass in as as arg ($in). Here is the code:


# read gff file into gff io object
my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3);

# create a Bio::DB::DBAdaptorI implementing object
my $db = Bio::DB::BioDB->new(-database   => $dbname,
                              -port       => $port,
                              -dbname     => $database,
                              -driver     => $driver,
                              -user       => $user,
                              -pass       => $pass,
                              );

# get appropriate object adaptor
my $adp = $db->get_object_adaptor("Bio::SeqI");

my $acc = "NC_004578"; # the genome seq id already in the db
my $seq = Bio::Seq->new(-accession_number => $acc,
			-display_id => $acc,
			-primary_id => $acc,
                         -namespace => $acc);

# Locate entry matching the unique key attributes and populate a
# persistent object with this entry.
my $dbseq = $adp->find_by_unique_key($seq);

# insert features from gff file into database.
while (my $feat = $gffio->next_feature()) {
   $dbseq->add_SeqFeature($feat);
   $dbseq->store;
   $dbseq->commit();
}


Is there additional code I should have here? I realize you're not a 
expert/user of gbrowse.. and this problem seems to be related to the 
gbrowse_details cgi script, which you probably are not familiar with. 
But I'm CC'ing the lists in case anyone else has some clues. I do 
appreciate any insight you might have though. It would be good to know 
if I'm doing all that I need to do to fully and correctly populate a 
biosql db with GFF/SeqFeature.

Thanks,
Genevieve



Hilmar Lapp wrote:
> Hi Genevieve,
> 
> there's a couple more regular users of BioSQL than one (about 25-30  
> groups), but not many who run GBrowse off of BioSQL (and I don't  count 
> among those - yet).
> 
> Of those who have posted before that they accomplished this, I  believe 
> none were using load_seqdatabase.pl to load the data.  Instead, they 
> loaded data through the DBGFF adaptor for BioSQL, i.e.,  like you would 
> load data into a GFF database, just using a different  adaptor.
> 
> load_seqdatabase.pl will load through the sequence-centric Bioperl  
> object model, and has no notion of GFF or GFF3 and associated  
> constraints (controlled vocabulary for feature type and source terms,  
> location types etc).
> 
> It is probably possible to load data through load_seqdatabase.pl and  
> then render it through GBrowse but doing so will almost certainly  
> require a SeqProcessor (see --pipeline argument to  load_seqdatabase.pl) 
> to be written that will appropriately unflatten  the feature array and 
> in fact probably have to use  SeqFeature::Annotated (where actually did 
> SeqFeature::TypedSeqFeature  go?). In parallel, bioperl-db will need to 
> be fixed to be prepared  for SeqFeatureI implementations that use 
> ontology terms for  primary_tag and source_tag instead of strings.
> 
> Is it possible for you to load your data through a GFF3 intermediary?  
> Bioperl has modules and in fact scripts that will write GFF3 (if I'm  
> not mistaken ...).
> 
>     -hilmar
> 
> On May 18, 2006, at 5:58 PM, Lincoln Stein wrote:
> 
>> Hi Genevieve,
>>
>> The problem is that none of us really knows anything about BioSQL.  
>> Hilmar is
>> the only regular user of this database. He's now gone to NESCent (duke
>> university) and may not be receiving mail sent to GNF.
>>
>> Lincoln
>>
>> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote:
>>
>>> Hi Scott,
>>>
>>> I'm still having the same problem. It might have to do with how the
>>> BioSQL database is populated. I use the load_seqdatabase.pl script to
>>> load the database along with bioperl-db functions for loading
>>> SeqFeatures directly. I took a closer look at how the tables are
>>> populated in the biosql tables. (If you're not familiar with  BioSQL the
>>> following may not be familiar to you -- i just want to put this
>>> observation out there...). I noticed that the 'term_id' field in the
>>> Location table was empty for the first gene record i had loaded.  When I
>>> set term_id to be '11', the id that corresponds with the 'gene'  
>>> ontology
>>> term, i notice a positive change in what's displayed on the
>>> gbrowse_details page for this record... the name of the gene  'dnaA' now
>>> appears in the title line in large blue font, as it should. The class
>>> name is still missing, as does all the detail about this gene -
>>> coordinates, etc.
>>>
>>> Lincoln suggests that I talk directly Hilmar Lapp who is the main  
>>> BioSQL
>>> developer. It could be that I am bumping up against things that  haven't
>>> been developed yet as far as the GBrowse<->BioSQL db connectivity  goes.
>>> I've been taking a closer look at gbrowse_details.pl, Browser.pm and
>>> Util.pm in order to try to understand where the disconnect might  be...
>>>
>>> To answer your question below.. yes GBrowse works fine for the
>>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table  database. 
>>> I'm
>>> using this installation of GBrowse 1.64 for several MySQL  databases 
>>> with
>>> the default gbrowse tables... everything is working fine. My only
>>> trouble with gbrowse crops up when interfacing with the biosql  mysql 
>>> db.
>>>
>>> Thanks for all your help,
>>> Genevieve
>>>
>>> Scott Cain wrote:
>>>
>>>> Hi Genevieve,
>>>>
>>>> I'm sorry this has hung out there unanswered for so long.  I  
>>>> suppose it
>>>> was because I chose not to answer it because it involved BioSQL  
>>>> (which I
>>>> know just about nothing about) and Simon seemed to think that the  
>>>> MySQL
>>>> adaptor was involved somehow (though it doesn't look to me like  it 
>>>> is).
>>>>
>>>> Anyway, I'll try to get started answering your questions  (assuming you
>>>> haven't already puzzled you way to one already).  See my comments  
>>>> below.
>>>>
>>>> Scott
>>>>
>>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote:
>>>>
>>>>> Hi,
>>>>> I'm running gbrowse 1.64 with a biosql database on a mac with  bioperl
>>>>> 1.5.1. I successfully loaded the database with  load_seqdatabase.pl 
>>>>> with
>>>>> NC_004578.gbk from NCBI.
>>>>>
>>>>> The features display as they should on the main gbrowse details  pane.
>>>>> However, when I click on one of the features I get GBrowse  Details 
>>>>> data
>>>>> record page with ":Details" at the top in large blue font but no  data
>>>>> for that gene display. In smaller red font, "Requested feature  not 
>>>>> found
>>>>> in database" which is followed by the normal details page footer  info
>>>>> ("For the source code for this browser, see...", etc).
>>>>>
>>>>> I'm using the 06.biosql.conf file - with appropriate additions in
>>>>> db_args for my database. I changed 'link' to
>>>>>
>>>>>     link = AUTO
>>>>>
>>>>> from what was there
>>>>>
>>>>>     link = 
>>>>> http://localhost/perl/gbrowse?ref=$ref;start=$start;stop= $end
>>>>>
>>>>> The default suggestion for 'link' is a little confusing.. why  does it
>>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why isn't
>>>>> 'cgi-bin' in the path?
>>>>
>>>>
>>>> I'm not sure why the suggested link is the way it is; perhaps that
>>>> config file predates the gbrowse_details script and no one  changed 
>>>> this
>>>> sample config file.  I changed it and it will be changed in the next
>>>> release.
>>>>
>>>> As for the path, 'perl' is a common url convention for scripts  that 
>>>> are
>>>> running under mod_perl, so I suspect the person who wrote this  sample
>>>> config file was running mod_perl.
>>>>
>>>>> When i set link to 'AUTO' I at least get the details page.
>>>>> gbrowse_details is not getting what it needs to disaply the  record 
>>>>> info
>>>>> though. The webserver error I get is:
>>>>>
>>>>> Subroutine Bio::SeqFeature::Generic::type redefined at
>>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/ BioSQL.pm
>>>>> line 126.
>>>>
>>>>
>>>> I'm not sure this is really the problem.  Let me make sure: it is  
>>>> after
>>>> you changed link to AUTO that you see this?  That is, the page  you see
>>>> now is as you described in your second paragraph, right?   
>>>> Unfortunately,
>>>> this is the part where I become particularly useless, since I  don't 
>>>> know
>>>> anything about BioSQL.  Is the details page working OK for the
>>>> yeast_chr1 dataset?
>>>>
>>>> Scott
>>>>
>>>>> I took a look at line 126 in BioSQL.pm - not sure what to make  of it.
>>>>>
>>>>> Any ideas? Am I overlooking anything?
>>>>>
>>>>> Thanks,
>>>>> Genevieve
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------
>>>>> Using Tomcat but need to do more? Need to support web services,  
>>>>> security?
>>>>> Get stuff done quickly with pre-integrated technology to make  your 
>>>>> job
>>>>> easier Download IBM WebSphere Application Server v.1.0.1 based  on 
>>>>> Apache
>>>>> Geronimo
>>>>> http://sel.as-us.falkag.net/sel? 
>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>> _______________________________________________
>>>>> Gmod-gbrowse mailing list
>>>>> Gmod-gbrowse at lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>>
>> -- 
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>>
>> -------------------------------------------------------
>> Using Tomcat but need to do more? Need to support web services,  
>> security?
>> Get stuff done quickly with pre-integrated technology to make your  
>> job easier
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache  
>> Geronimo
>> http://sel.as-us.falkag.net/sel? cmd=lnk&kid=120709&bid=263057&dat=121642
>> _______________________________________________
>> Gmod-gbrowse mailing list
>> Gmod-gbrowse at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
> 



More information about the BioSQL-l mailing list