[BioSQL-l] [Gmod-gbrowse] gbrowse details/record view with biosql

Tue May 23 23:50:21 EDT 2006

Hi Genevieve & Scott, see below for interspersed comments.

On May 23, 2006, at 3:38 PM, Genevieve DeClerck wrote:

> Hi Hilmar,
>
> I apologize in advance if I'm talking about something that is well  
> documented somewhere
> but I'm still having trouble understanding exactly what I need to  
> do to get a biosql database loaded in such a way so that it can  
> interact fully with gbrowse - it seems to be half way there.
>
> I use load_seqdatabase.pl to load the genome sequence (single  
> sequence in fasta format) into a biosql database but I populate the  
> db with features using what I thought was a GFF-centric approach,  
> not with load_seqdatabase.pl -- see my code in #4 below.
>
> Here is exactly what I do:
>
> [...]
> 3) Use load_seqdatabase.pl to load the single genomic dna sequence:
>
> 	load_seqdatabase.pl -dbuser=xx -dbpass=xx -dbname test_biosql2 - 
> namespace NC_004578 -format fasta 6853.fasta
>
> 4) I then use a script I wrote to load the SeqFeatures which are in  
> gff format in a file i pass in as as arg ($in). Here is the code:
>
>
> # read gff file into gff io object
> my $gffio = Bio::Tools::GFF->new(-file=> $in, -gff_version => 3);
>
> # create a Bio::DB::DBAdaptorI implementing object
> my $db = Bio::DB::BioDB->new(-database   => $dbname,
>                              -port       => $port,
>                              -dbname     => $database,
>                              -driver     => $driver,
>                              -user       => $user,
>                              -pass       => $pass,
>                              );
>
> # get appropriate object adaptor
> my $adp = $db->get_object_adaptor("Bio::SeqI");
>
> my $acc = "NC_004578"; # the genome seq id already in the db
> my $seq = Bio::Seq->new(-accession_number => $acc,
> 			-display_id => $acc,
> 			-primary_id => $acc,
>                         -namespace => $acc);
>
> # Locate entry matching the unique key attributes and populate a
> # persistent object with this entry.
> my $dbseq = $adp->find_by_unique_key($seq);
>
> # insert features from gff file into database.
> while (my $feat = $gffio->next_feature()) {
>   $dbseq->add_SeqFeature($feat);
>   $dbseq->store;
>   $dbseq->commit();
> }
>
>
> Is there additional code I should have here? I realize you're not a  
> expert/user of gbrowse.. and this problem seems to be related to  
> the gbrowse_details cgi script, which you probably are not familiar  
> with.

So your use case is that you have a sequence in simple fasta format  
with its annotation in another file in GFF3 format, and you want to  
load both into a Biosql database and visualize in GBrowse.

It looks like I was in fact on the wrong path the whole time. The  
Gbrowse Biosql adaptor that I can find is a Bio::DasI adaptor through  
which you cannot load but only retrieve, so I have to assume that you  
were right in using load_seqdatabase.pl. Can somebody help out here  
who has been using Biosql as the underlying database and confirm or  
set me straight?

If that is the procedure then your code looks alright. Also, it looks  
like Bio::Tools::GFF does not return hierarchical feature graphs for  
v3 input (which bioperl-db wouldn't handle properly because it  
doesn't support the feature_relationship table yet).

So, I'm in fact at a loss explaining why the details page doesn't  
work for you, given that people reported it to work before. I'm  
inclined to claim that the respective Gbrowse code has changed,  
either in the way it expects the feature to be set up, or in the way  
it uses the DasI interface, and broke the Biosql adaptor. Can  
somebody (Scott? Lincoln?) comment on whether there were any changes  
in this regard?

The lines in gbrowse_detail that look like lead to the problem is

my @features = sort {$b->length<=>$a->length} $CONFIG->_feature_get 
($db,$name,$class);
@features    = sort {$b->length<=>$a->length} $CONFIG->_feature_get 
($db,$ref,$class,$start,$end,1)
   unless @features;

neither of which returns any matches.

I have no clue yet how those two calls get translated into DasI to  
bioperl-db queries.

	-hilmar

> But I'm CC'ing the lists in case anyone else has some clues. I do  
> appreciate any insight you might have though. It would be good to  
> know if I'm doing all that I need to do to fully and correctly  
> populate a biosql db with GFF/SeqFeature.
>
> Thanks,
> Genevieve
>
>
>
> Hilmar Lapp wrote:
>> Hi Genevieve,
>> there's a couple more regular users of BioSQL than one (about  
>> 25-30  groups), but not many who run GBrowse off of BioSQL (and I  
>> don't  count among those - yet).
>> Of those who have posted before that they accomplished this, I   
>> believe none were using load_seqdatabase.pl to load the data.   
>> Instead, they loaded data through the DBGFF adaptor for BioSQL,  
>> i.e.,  like you would load data into a GFF database, just using a  
>> different  adaptor.
>> load_seqdatabase.pl will load through the sequence-centric  
>> Bioperl  object model, and has no notion of GFF or GFF3 and  
>> associated  constraints (controlled vocabulary for feature type  
>> and source terms,  location types etc).
>> It is probably possible to load data through load_seqdatabase.pl  
>> and  then render it through GBrowse but doing so will almost  
>> certainly  require a SeqProcessor (see --pipeline argument to   
>> load_seqdatabase.pl) to be written that will appropriately  
>> unflatten  the feature array and in fact probably have to use   
>> SeqFeature::Annotated (where actually did  
>> SeqFeature::TypedSeqFeature  go?). In parallel, bioperl-db will  
>> need to be fixed to be prepared  for SeqFeatureI implementations  
>> that use ontology terms for  primary_tag and source_tag instead of  
>> strings.
>> Is it possible for you to load your data through a GFF3  
>> intermediary?  Bioperl has modules and in fact scripts that will  
>> write GFF3 (if I'm  not mistaken ...).
>>     -hilmar
>> On May 18, 2006, at 5:58 PM, Lincoln Stein wrote:
>>> Hi Genevieve,
>>>
>>> The problem is that none of us really knows anything about  
>>> BioSQL.  Hilmar is
>>> the only regular user of this database. He's now gone to NESCent  
>>> (duke
>>> university) and may not be receiving mail sent to GNF.
>>>
>>> Lincoln
>>>
>>> On Thursday 18 May 2006 17:18, Genevieve DeClerck wrote:
>>>
>>>> Hi Scott,
>>>>
>>>> I'm still having the same problem. It might have to do with how the
>>>> BioSQL database is populated. I use the load_seqdatabase.pl  
>>>> script to
>>>> load the database along with bioperl-db functions for loading
>>>> SeqFeatures directly. I took a closer look at how the tables are
>>>> populated in the biosql tables. (If you're not familiar with   
>>>> BioSQL the
>>>> following may not be familiar to you -- i just want to put this
>>>> observation out there...). I noticed that the 'term_id' field in  
>>>> the
>>>> Location table was empty for the first gene record i had  
>>>> loaded.  When I
>>>> set term_id to be '11', the id that corresponds with the 'gene'   
>>>> ontology
>>>> term, i notice a positive change in what's displayed on the
>>>> gbrowse_details page for this record... the name of the gene   
>>>> 'dnaA' now
>>>> appears in the title line in large blue font, as it should. The  
>>>> class
>>>> name is still missing, as does all the detail about this gene -
>>>> coordinates, etc.
>>>>
>>>> Lincoln suggests that I talk directly Hilmar Lapp who is the  
>>>> main  BioSQL
>>>> developer. It could be that I am bumping up against things that   
>>>> haven't
>>>> been developed yet as far as the GBrowse<->BioSQL db  
>>>> connectivity  goes.
>>>> I've been taking a closer look at gbrowse_details.pl, Browser.pm  
>>>> and
>>>> Util.pm in order to try to understand where the disconnect  
>>>> might  be...
>>>>
>>>> To answer your question below.. yes GBrowse works fine for the
>>>> yeast_chr1 dataset when it's loaded in the gbrowse 7-table   
>>>> database. I'm
>>>> using this installation of GBrowse 1.64 for several MySQL   
>>>> databases with
>>>> the default gbrowse tables... everything is working fine. My only
>>>> trouble with gbrowse crops up when interfacing with the biosql   
>>>> mysql db.
>>>>
>>>> Thanks for all your help,
>>>> Genevieve
>>>>
>>>> Scott Cain wrote:
>>>>
>>>>> Hi Genevieve,
>>>>>
>>>>> I'm sorry this has hung out there unanswered for so long.  I   
>>>>> suppose it
>>>>> was because I chose not to answer it because it involved  
>>>>> BioSQL  (which I
>>>>> know just about nothing about) and Simon seemed to think that  
>>>>> the  MySQL
>>>>> adaptor was involved somehow (though it doesn't look to me  
>>>>> like  it is).
>>>>>
>>>>> Anyway, I'll try to get started answering your questions   
>>>>> (assuming you
>>>>> haven't already puzzled you way to one already).  See my  
>>>>> comments  below.
>>>>>
>>>>> Scott
>>>>>
>>>>> On Thu, 2006-04-20 at 11:29 -0400, Genevieve DeClerck wrote:
>>>>>
>>>>>> Hi,
>>>>>> I'm running gbrowse 1.64 with a biosql database on a mac with   
>>>>>> bioperl
>>>>>> 1.5.1. I successfully loaded the database with   
>>>>>> load_seqdatabase.pl with
>>>>>> NC_004578.gbk from NCBI.
>>>>>>
>>>>>> The features display as they should on the main gbrowse  
>>>>>> details  pane.
>>>>>> However, when I click on one of the features I get GBrowse   
>>>>>> Details data
>>>>>> record page with ":Details" at the top in large blue font but  
>>>>>> no  data
>>>>>> for that gene display. In smaller red font, "Requested  
>>>>>> feature  not found
>>>>>> in database" which is followed by the normal details page  
>>>>>> footer  info
>>>>>> ("For the source code for this browser, see...", etc).
>>>>>>
>>>>>> I'm using the 06.biosql.conf file - with appropriate additions in
>>>>>> db_args for my database. I changed 'link' to
>>>>>>
>>>>>>     link = AUTO
>>>>>>
>>>>>> from what was there
>>>>>>
>>>>>>     link = http://localhost/perl/gbrowse?ref=$ref;start= 
>>>>>> $start;stop= $end
>>>>>>
>>>>>> The default suggestion for 'link' is a little confusing.. why   
>>>>>> does it
>>>>>> link to 'gbrowse' and not 'gbrowse_details' script? Also, why  
>>>>>> isn't
>>>>>> 'cgi-bin' in the path?
>>>>>
>>>>>
>>>>> I'm not sure why the suggested link is the way it is; perhaps that
>>>>> config file predates the gbrowse_details script and no one   
>>>>> changed this
>>>>> sample config file.  I changed it and it will be changed in the  
>>>>> next
>>>>> release.
>>>>>
>>>>> As for the path, 'perl' is a common url convention for scripts   
>>>>> that are
>>>>> running under mod_perl, so I suspect the person who wrote this   
>>>>> sample
>>>>> config file was running mod_perl.
>>>>>
>>>>>> When i set link to 'AUTO' I at least get the details page.
>>>>>> gbrowse_details is not getting what it needs to disaply the   
>>>>>> record info
>>>>>> though. The webserver error I get is:
>>>>>>
>>>>>> Subroutine Bio::SeqFeature::Generic::type redefined at
>>>>>> /Library/Perl/5.8.1//darwin-thread-multi-2level/Bio/DB/Das/  
>>>>>> BioSQL.pm
>>>>>> line 126.
>>>>>
>>>>>
>>>>> I'm not sure this is really the problem.  Let me make sure: it  
>>>>> is  after
>>>>> you changed link to AUTO that you see this?  That is, the page   
>>>>> you see
>>>>> now is as you described in your second paragraph, right?    
>>>>> Unfortunately,
>>>>> this is the part where I become particularly useless, since I   
>>>>> don't know
>>>>> anything about BioSQL.  Is the details page working OK for the
>>>>> yeast_chr1 dataset?
>>>>>
>>>>> Scott
>>>>>
>>>>>> I took a look at line 126 in BioSQL.pm - not sure what to  
>>>>>> make  of it.
>>>>>>
>>>>>> Any ideas? Am I overlooking anything?
>>>>>>
>>>>>> Thanks,
>>>>>> Genevieve
>>>>>>
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------
>>>>>> Using Tomcat but need to do more? Need to support web  
>>>>>> services,  security?
>>>>>> Get stuff done quickly with pre-integrated technology to make   
>>>>>> your job
>>>>>> easier Download IBM WebSphere Application Server v.1.0.1  
>>>>>> based  on Apache
>>>>>> Geronimo
>>>>>> http://sel.as-us.falkag.net/sel?  
>>>>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>>>>> _______________________________________________
>>>>>> Gmod-gbrowse mailing list
>>>>>> Gmod-gbrowse at lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>> -- 
>>> Lincoln D. Stein
>>> Cold Spring Harbor Laboratory
>>> 1 Bungtown Road
>>> Cold Spring Harbor, NY 11724
>>> (516) 367-8380 (voice)
>>> (516) 367-8389 (fax)
>>> FOR URGENT MESSAGES & SCHEDULING,
>>> PLEASE CONTACT MY ASSISTANT,
>>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>>
>>>
>>> -------------------------------------------------------
>>> Using Tomcat but need to do more? Need to support web services,   
>>> security?
>>> Get stuff done quickly with pre-integrated technology to make  
>>> your  job easier
>>> Download IBM WebSphere Application Server v.1.0.1 based on  
>>> Apache  Geronimo
>>> http://sel.as-us.falkag.net/sel?  
>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>
>
>
> -------------------------------------------------------
> All the advantages of Linux Managed Hosting--Without the Cost and  
> Risk!
> Fully trained technicians. The highest number of Red Hat  
> certifications in
> the hosting industry. Fanatical Support. Click to learn more
> http://sel.as-us.falkag.net/sel? 
> cmd=lnk&kid=107521&bid=248729&dat=121642
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================