[Biopython] SQL Alchemy based BioSQL

Brad Chapman chapmanb at 50mail.com
Fri Aug 21 08:46:14 EDT 2009


Hi all;

Kyle:
> > I've posted a git fork of biopython with a BioSQL system based on SQL
> > Alchemy.  It can be found at git://github.com/kellrott/biopython.git
> > It successfully completes unit tests copied from test_BioSQL and
> > test_BioSQL_SeqIO.

Awesome.

Peter:
> Brad Chapman had already suggested something with BioSQL
> and SQLAlchemy, but I can't find the emails right now. Maybe
> we talked about it in person at BOSC 2009... I forget. Brad?

Yup, I was floating this idea around. It's great to see someone
tackling it.

> But what I think I said then was that while I like SQLAlchemy,
> and have used it with BioSQL as part of a web application, I
> don't see that we need it for Biopython's BioSQL support. We
> essentially have a niche ORM for going between the BioSQL
> tables and the Biopython SeqRecord object.
> 
> I don't see more back end databases alone as a good reason
> for using SQLAlchemy in Biopython's BioSQL bindings. In
> most (all?) cases SQLAlchemy in turn calls something like
> MySQLdb to do the real work.

SQLAlchemy is a pervasive and growing part of interacting
with databases using Python. It encapsulates all of the nastiness of
dealing with individual databases and has a large community
resolving problems on more niche setups like Jython+MySQL. It also
offers a nice object layer which is an alternative to the BioSeq
interface we have built.

It's a lightweight install -- all python and no external
dependencies beyond the interfaces you would already need to have
to access your database of choice.

Why do we want to be learning and implementing database specific 
things when there is code already taking care of these problems?
Kyle implemented this so it can live beside the existing code base,
which I think is a nice move. I'm +1 on including this and moving in
the direction of SQLAlchemy.

> Something I would be interested in is a set of SQLAlchemy
> model definitions for the BioSQL tables (ideally database
> neutral). I've got a very preliminary, partial and minimal
> set done - and I think Brad has some too. This would be
> useful for anyone wanting to go beyond the Biopython
> SeqRecord based BioSQL support.

Yes, this would be my only suggestion. It would be really useful to
have the BioSQL tables mapped as object definitions and have the
SQLAlchemy BioSQL based on these. This would open us up to other
object based implementations like Google App Engine or Document
database mappers. I pushed what I have so far in this direction on
GitHub:

http://github.com/chapmanb/bcbb/blob/master/biosql/BioSQL-SQLAlchemy_definitions.py

I also implemented some of the objects in Google App Engine and
replicated the current Biopython BioSQL structure for loading and
retrieving objects:

http://github.com/chapmanb/biosqlweb/tree/master/app/lib/python/BioSQL/GAE

This is all partially finished, but please feel free to take whatever is useful.

Brad


More information about the Biopython mailing list