[Biopython] SQL Alchemy based BioSQL

Peter biopython at maubp.freeserve.co.uk
Thu Aug 20 20:10:05 UTC 2009


Hi Kyle,

Thanks for signing up to the mailing list to talk about this work.

On Thu, Aug 20, 2009 at 7:26 PM, Kyle Ellrott<kellrott at gmail.com> wrote:
> I've posted a git fork of biopython with a BioSQL system based on SQL
> Alchemy.  It can be found at git://github.com/kellrott/biopython.git
> It successfully completes unit tests copied from test_BioSQL and
> test_BioSQL_SeqIO.
> The unit testing runs on sqlite.  But it should abstract out to any
> database system that SQLAlchemy supports.  From the web site, the list
> includes: SQLite, Postgres, MySQL, Oracle, MS-SQL, Firebird, MaxDB, MS
> Access, Sybase, Informix, and IBM DB2.

Sounds interesting - but can you explain your motivation?

Brad Chapman had already suggested something with BioSQL
and SQLAlchemy, but I can't find the emails right now. Maybe
we talked about it in person at BOSC 2009... I forget. Brad?

But what I think I said then was that while I like SQLAlchemy,
and have used it with BioSQL as part of a web application, I
don't see that we need it for Biopython's BioSQL support. We
essentially have a niche ORM for going between the BioSQL
tables and the Biopython SeqRecord object.

I don't see more back end databases alone as a good reason
for using SQLAlchemy in Biopython's BioSQL bindings. In
most (all?) cases SQLAlchemy in turn calls something like
MySQLdb to do the real work.

You mention lots of other back ends supported by SQLAlchemy,
but very few of them have BioSQL schemas - currently just
these exist only for PostgreSQL, MySQL, Oracle, HSQLDB,
and Apache Derby. As you know (because it is in your branch,
grin), Brad has done a schema for SQLite and got this working
with Biopython already, and we already support MySQL and
PostgreSQL.

That just leaves Biopython lacking support for the existing
Oracle, HSQLDB, and Apache Derby BioSQL schemas.
As long as these have a python binding using the Python
Database API Specification v2.0 shouldn't be hard.
For example, extending Biopython's BioSQL support using
cx_Oracle to talk to an Oracle database seems like a useful
incremental improvement.

[That wasn't meant to come across as negative, I'm just
wary of adding a heavyweight dependency without a good
reason]

Something I would be interested in is a set of SQLAlchemy
model definitions for the BioSQL tables (ideally database
neutral). I've got a very preliminary, partial and minimal
set done - and I think Brad has some too. This would be
useful for anyone wanting to go beyond the Biopython
SeqRecord based BioSQL support.

Peter




More information about the Biopython mailing list