[BioSQL-l] Problems in DB2 with VARCHAR, TEXT and CLOB using BioJava
Florian Mittag
florian.mittag at uni-tuebingen.de
Thu Jul 9 15:16:12 UTC 2009
Hi all!
I'm posting this to both the BioSQL and the BioJava-dev mailinglist because
the problem resides in both domains, I hope this is okay.
We're working on getting BioJava to run with a DB2 Express-C backend for
various reasons. We've encountered several problems during this task, but
this one seems to have no real solution.
When adapting the BioSQL schema to DB2, the official IBM conversion guide
tells us to use the data type CLOB where MySQL uses TEXT.
(Chapter 11 in
ftp://ftp.software.ibm.com/software/data/db2/migration/mtk/mtk_2050.pdf)
So far, no problem. But when we tried reading some genebank files with
BioJava, the DB2 driver threw an exception:
SQL0401N The data types of the operands for the operation "=" are not
compatible. SQLSTATE=42818 SQLCODE=-401
Explanation:
The class org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder defines some
Hibernate queries, of which one has the conditions:
"from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"
All three columns "authors", "location", and "title" are of type TEXT in MySQL
and of type CLOB in DB2, so comparing them with "=" leads to the above error
message.
The way I see it, there are only two possible solutions to this problem:
1) Change the query to
"from DocRef as cr where cr.authors LIKE '?' and cr.location LIKE '?' and
cr.title LIKE '?'"
2) Change the data type to something comparable with "=", like VARCHAR.
Solution 1 is no real solution to me, because comparing values with "LIKE"
usually is slow and it seems a bit odd to change a query that works with
other databases just for DB2.
But taking a closer look, solution 2 has some problems, too:
Although VARCHARs in DB2 can have a length of theoretically 32767, in reality
they are limited by the page size of the database, which can be 32K at
maximum. Since this particular table "reference" has three columns of this
type, the sum of their lengths must not exceed 32767, so they could only be
something like VARCHAR(10000).
I have never encountered cases in which values come even close to the length
of 10000, but you can never be sure.
And that is why I post here. For me, the way to go is pretty clear, but we
intend to be as compatible as possible with the original BioSQL. Maybe you
could give me some input on how to solve this problem with as few casualties
as possible ;-)
Thanks,
Florian
More information about the BioSQL-l
mailing list