[Bioperl-l] Walking multiple bioentries using bioperl-db

Wed Jul 19 13:43:52 UTC 2006

Howdy --

I'm using bioperl-db + biosql-schema + mySQL.

I can now successfully build a biosql-schema instance in mySQL, load 
taxonomy, then using bioperl-db load a GenBank file from disk, commiting 
the sequences I want. For a given accession number + version + namespace, 
I can tell bioperl-db to delete that from mySQL and it does. Yay!! I'll be 
throwing a "Using bioperl-db" document onto the wiki over the next week.

What I am current baffled by:

How do I ask bioperl-db to walk over multiple bioentries in my database so 
I can do things with them? The simplest possible example: print a list of 
all bioentries in my database.

It is trivially easy to just query mySQL directly, but if I'm reading / 
understanding the documentation correctly bioperl-db intends to be 
database schema and RDBMS agnostic. In that case, I should use bioperl-db 
to walk my records. So, how do I do that?

Is Bio::DB::Query::BioQuery the way to do this? The only way?

If so then can someone help me understand the datacollections() and 
where() methods?

perldoc Bio::DB::Query::BioQuery

          # all mouse sequences loaded under namespace ensembl that
          # have receptor in their description
          $query->datacollections(["Bio::PrimarySeqI e",
                                 "Bio::Species=>Bio::PrimarySeqI sp",
                                 "BioNamespace=>Bio::PrimarySeqI db"]);
          $query->where(["sp.binomial like 'Mus *'",
                         "e.desc like '*receptor*'",
                         "db.namespace = 'ensembl'"]);

          # all mouse sequences loaded under namespace ensembl that
          # have receptor in their description, and that also have a
          # cross-reference with SWISS as the database
          $query->datacollections(["Bio::PrimarySeqI e",
                                 "Bio::Species=>Bio::PrimarySeqI sp",
                                 "BioNamespace=>Bio::PrimarySeqI db",
                                 "Bio::Annotation::DBLink xref",

I'm bewildered by this API. Please forgive my ignorance.

1) How do I get *all* bioentries out of my database?

2) Say I did want just the "namespace" 'Pico' (one of my 
biodatabase.name's). Where did

    "BioNamespace=>Bio::PrimarySeqI db"]);

come from? How was I supposed to figure out the left hand side of that 
mapping? The right hand side? If that line wasn't sitting in that document 
was there a way for me to figure it out as a *user* of bioperl-db? Or 
would I need to be a *programmer* of bioperl-db reading source to figure 
this out? Where did

    "db.namespace = 'ensembl'"]);

come from? Again, do I have to read source code to know how to invoke 
that magic?

Sorry if I sound like a jerk. That is not my intention. Hopefully I can 
document the answers for future bioperl-db'ers.

Thanks in advance,

j
my current plaything: http://openlab.jays.net