[Bioperl-l] DB2 driver for BioPerl

Sat Jul 4 06:39:37 EDT 2009

Hi Florian:

On Jul 2, 2009, at 11:28 AM, Florian Mittag wrote:

> Hi!
>
> I previously posted a message on the BioSQL mailinglist regarding a  
> BioSQL
> schema for DB2 and we are several steps closer to completion now.

Good to hear!

> We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL  
> to fill
> our DB2 database with taxonomy data

Would you mind posting to the BioSQL list which changes you had to  
make to make the script work with DB2?

More generally, is there some kind of comprehensive documentation on  
what is different in DB2 from standard SQL92? The  
load_ncbi_taxonomy.pl script should in principle work with any SQL92- 
compliant RDBMS ... Have you found that not to be the case (which  
would be a bug), or is DB2 in some ways not SQL92-compliant?

> , but loading the gene ontology with BioPerl's "load_ontology.pl" is  
> somewhat harder.

The ontology as well as the sequence loader are really just front-ends  
to the Bioperl-db object-relational mappers (ORMs). So I would start  
there, rather than looking at errors the script does or does not throw  
(you don't want to run all combinations of command line parameters  
that would exercise each and every feature of the script).

In order to create DB2 driver support in Bioperl-db, you need to add  
two things. First, you need to create a module Bio/DB/DBI/DB2.pm that  
overrides the methods from base.pm according to DB2. The fact that you  
didn't report any errors about that module not having been found  
suggests that you've done this already.

The second step is as you say to create a package Bio/DB/BioSQL/DB2  
with at least BasePersistenceAdaptorDriver.pm	as module in it, and  
starting with a copy of the existing ones is indeed the best way to  
get started on this. Unless you also created the DB2 database DDL  
scripts from the Oracle ones, I wouldn't necessarily copy from Oracle  
though, but maybe rather from Pg. And rather than looking for errors  
of one of the scripts, I'd just go systematically through the files  
and make sure the SQL in there is DB2 compliant.

> [...]
> It first ran a few minutes processing the file and then died after the
> following SQL-command was prepared and executed:
>
> "SELECT term.term_id, term.identifier, term.name, term.definition,
> term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier  
> = ?"

Could you post the full error message? It is rather difficult to  
diagnose what's going on w/o the error message and stack trace.

I'd be surprised BTW if DB2 were indeed offended by the NULL in the  
above statement - I'm pretty sure that "SELECT NULL FROM  
sometable" (or "SELECT 1 FROM sometable") is standard SQL. Are you  
sure that if you execute such a statement at a SQL prompt it results  
in an error?

Since I can hardly believe that DB2 doesn't support selecting  
constants (NULL is as much a constant as 1 is), maybe what it wants  
though is aliasing the column. So if

SELECT NULL FROM bioentry;

yields an error, does

SELECT NULL AS colAlias FROM bioentry;

work fine?

> I don't know if the "NULL" column is supposed to be there

It is. The code in BaseDriver.pm that you were looking at should not  
need to be modified. (Rather, DB2/BasePersistenceAdaptorDriver.pm is  
supposed to override any method that needs to be adapted to DB2.) The  
way the ORM works is by trying to map all properties of a BioPerl  
object that are persistent to a column of a table in the database. If  
it can't map a property (for whatever reason) its value is simply  
always undef (or NULL in SQL). I.e., NULL columns are the placeholder  
for a column that failed to be mapped to a property. You can't simply  
remove them or all subsequent columns are shifted.

Hth,
	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================