[Bioperl-l] Database Retrieval

Chris Fields cjfields at uiuc.edu
Tue Aug 8 13:26:46 UTC 2006


It's important to initially build something capable of returning  
everything UCSC has to offer, initially as just raw data in XML or  
text.  Open the floodgates, so to speak.

That was why I designed EUtilities.  It returns literally anything  
from Entrez accessible by parameters in the format (text or XML) it  
will likely be used in; it's not limited to only sequences, pubmed,  
etc.  And I can access all the EUtilities (elink, efetch, epost, and  
so on).

Why?  (Evil laugh.....) Because access to the data, IMHO, is more  
important to get set up first, even it it only returns raw data.   
Then the critical infrastructure is there in the DB class to get  
anything you want from the database.  You can then use your DB class  
as a DB handle or web agent inside another class which has a  
consistent API, like that for RandomAccessI (sequence-specific DB  
access), to get the data into the appropriate objects.   
Bio::Taxonomy::Node attempted a similar thing, correct?

Hilmar wanted to know the following, which indicates going beyond  
just sequences:

> would it be possible to return standard bioperl objects, like
> Bio:SeqI objects, or Bio::Annotation::Reference, Bio::LocationI, etc?

'Front-end' classes that return appropriate objects (SeqI, LocationI,  
etc) could be built around the DB class; the key is the consistent  
interface.  So we would need a RandomAccessI-like interface for  
LocationI, Annotation::References, etc.  If someone really wants  
references, they could build a class to get them into the appropriate  
objects using your DB class as the 'backend' to get the raw data.

Chris

On Aug 8, 2006, at 7:44 AM, Sendu Bala wrote:

> Sean Davis wrote:
>> That is certainly possible--this is perl, right?  I'll think about  
>> it, but I
>> doubt that I have the time to put together a satisfactory "grand"  
>> solution
>> that allows arbitrary queries without specifying SQL, returns bioperl
>> objects, and doesn't reflect some of the underlying schema.  If  
>> one settles
>> on a set of objects that one wants to return, the process will be  
>> easier,
>> but that limits the information that one can get from the database.
>>
>> Practically, to have a "table-browser-like" code interface will  
>> require
>> exposing some of the SQL schema, as column names and table names  
>> will need
>> to come into it.
>
> Not necessarily. You only have to have a mapping from the conceptual
> purpose of the table to its current name (and likewise for  
> columns). So
> instead of a module called 'refLink' because there is a table called
> 'refLink', you might have something called refseq_mrna_links which  
> maps
> to 'refLink'. Oh, and given the sheer number of tables, I don't  
> think it
> would be appropriate to have a module per table.
>
> How about some single module that does the selection of the relevant
> database and table given $db and $table_concept? Perhaps:
>
> # map 'human' to the possible human databases, default 'hgXX'
> my $db = Bio::DB::UCSC::Databases('human');
>
> # map 'refseq_mrna_links' to 'refLink' and return a
> # Bio::DB::UCSC::Queryable
> my $queryable = new Bio::DB::UCSC::Table($db, 'refseq_mrna_links');
>
> # map mrna_accession method and its args to
> # query => [mrnaAcc => {like => 'NM_00002%'}]
> my $row_data = $queryable->mrna_accession(-like => 'NM_00002%');
>
>
> Even that's not so hot; you still have to know some massive list of
> inflexible table-concept names like 'refseq_mrna_links'. Perhaps it
> would be even better if it was truly concept based. You say what you
> want and it figures out the correct table:
>
> my $queryable = new Bio::DB::UCSC::Table($db, 'mrna_accession',
> 'genomic_coordinates');
>
>
> Sane? Reasonable? Desirable? Possible? I'm just throwing ideas out;  
> you
> may see a better way of achieving similar ends.
>
>
>> Taking such an approach, either based on RDBO or with
>> hand-coded SQL management, precludes returning bioperl-type  
>> objects.   On the
>> other hand, if one wants only bioperl-type objects returned, the  
>> information
>> that can be returned is quite limited and the query structure  
>> (from a perl
>> point of view) will need to be limited to a set of fields that can
>> ultimately be used to look up only the information associated with  
>> bioperl
>> objects.  I think the table-browser-like approach is the better  
>> way to go to
>> start; let the user deal with making bioperl objects as he/she  
>> sees fit once
>> the data is back.  As a second round of development, one could  
>> certainly
>> build a compatibility layer that uses the primary query engine to  
>> pull out
>> information for constructing key bioperl objects, but I don't  
>> think that
>> should be the primary goal, but a secondary one.
>
> Yes, that's the way it should be done, but the interface for the  
> primary
> query engine ought still be independent of the table structure.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list