[Bioperl-l] Database Retrieval

Tue Aug 8 11:41:57 UTC 2006

On 8/8/06 5:21 AM, "Sendu Bala" <bix at sendu.me.uk> wrote:

> Sean Davis wrote:
>> 
>> On 8/7/06 1:53 PM, "Sendu Bala" <bix at sendu.me.uk> wrote:
>> 
>>> Do you want to go ahead and look into making those classes for
>>> accessing the common tables? It's in my plan to make various
>>> aspects of genomic data retrieval a strength of bioperl as opposed
>>> to a surprising missing link
>>> (http://www.bioperl.org/wiki/Getting_Genomic_Sequences); I'll get
>>> to that in a few weeks but if you lay the ground work or better yet
>>> complete everything before then that would be great! :)
>> 
>> So, there is a sketch of what things would look like here:
>> 
>> http://watson.nci.nih.gov/~sdavis/Bio-DB-UCSC.tar.gz
> 
> Thanks for that.
> 
> 
>> only includes the refLink and refFlat tables so far, but adding other
>> tables is pretty straightforward, as you can see from the code.  I
>> would love to hear comments.  Basically, to use, you can do something
>> like that shown in the synopsis and output is given below:
>> 
>> NAME Bio::DB::UCSC - Access UCSC MySQL tables nicely
>> 
>> SYNOPSIS use Bio::DB::UCSC::RefLink::Manager;
>> 
>> my $reflinks = Bio::DB::UCSC::RefLink::Manager->get_reflinks( query
>> => [ mrnaAcc => {like => 'NM_00002%'}, ], );
> 
> I appreciate that this is due to the way Rose::DB works, but is it
> possible to hide the SQL nature of what we're doing? Is it possible to
> hide even the table names?
> 
> Ideally the interface API would survive a complete change in UCSC's
> table structures. The implementation would have to change, but user code
> would not.
> 
> Are you willing to take this on from your outline and develop a set of
> more bioperlish modules? Even if you don't have time your contribution
> so far is certainly valuable, so thank you.
> 
> I envisage that Bio::DB::UCSC.pm would be the easy-to-use starting
> point, presenting a code interface similar to the UCSC table browsing
> web interface. And while it would implement using various submodules,
> even UCSC.pm would be protected from SQL and table changes.

That is certainly possible--this is perl, right?  I'll think about it, but I
doubt that I have the time to put together a satisfactory "grand" solution
that allows arbitrary queries without specifying SQL, returns bioperl
objects, and doesn't reflect some of the underlying schema.  If one settles
on a set of objects that one wants to return, the process will be easier,
but that limits the information that one can get from the database.

Practically, to have a "table-browser-like" code interface will require
exposing some of the SQL schema, as column names and table names will need
to come into it.  Taking such an approach, either based on RDBO or with
hand-coded SQL management, precludes returning bioperl-type objects.  On the
other hand, if one wants only bioperl-type objects returned, the information
that can be returned is quite limited and the query structure (from a perl
point of view) will need to be limited to a set of fields that can
ultimately be used to look up only the information associated with bioperl
objects.  I think the table-browser-like approach is the better way to go to
start; let the user deal with making bioperl objects as he/she sees fit once
the data is back.  As a second round of development, one could certainly
build a compatibility layer that uses the primary query engine to pull out
information for constructing key bioperl objects, but I don't think that
should be the primary goal, but a secondary one.

All that said, I think some more discussion with some judicious code
examples (even if WAY off track, as mine probably is) is probably needed
before settling on a path forward.

Sean