[BioPython] [bip] [OT] Revision control and databases

Kevin Teague kteague at bcgsc.ca
Fri Oct 24 18:32:41 UTC 2008


>
>> I think this is a big issue for bioinformatics. How is it possible  
>> that nobody
>> has never tried to implement such a functionality for databases
>
> Databases (DBMS, to be picky) are a general-purpose solution for many
> different kinds of problem.  Revision control is an inhomogeneous  
> problem
> with no optimal solution that can be implemented in many ways and  
> not only
> using DBMS.  There are plenty of revision control examples  
> implemented in
> databases, and the examples that first come to mind in Python for me  
> are
> content management systems such as Zope and Plone.  I think that BASE
> implements one, but it's a long time since I looked at it.

The default file storage for Zope Object Database (ZODB) appends all  
new database writes, keeping older transactions on disk (similar to  
the way PostgreSQL works). Back in the day (circa 2000) Zope 2 exposed  
this database-level feature at the application level in the Zope  
Management Interface (ZMI). So you could see all past writes to the  
database, and try and revert back to an older one if desired (using  
the "undo" tab of the ZMI).

Problems with this approach included using sysadmin tools on the  
database could break application behaviour. e.g. lets say you had a  
"Document" object and a "Page Counter" object, you would wish to be  
able to view older versions of Documents, but only care about the  
current state of the Page Counters. However, if your Page Counters are  
changing like crazy and taking up tonnes of disk space and generally  
slowing down queries against the history of the database, there was no  
way to say "delete all outdated ephemeral Page Counter versions, but  
keep Document-related transactions" (especially since a Page Counter  
change and a Document change often commited in the same transaction).  
ZWiki exposed older revisions using this feature, and the accepted  
practice was to put each wiki into it's own database so that other  
forms of database maintenance didn't accidently blow away your wiki  
history ... it wasn't so pretty :P

You also had problems reverting back to just a specific revision, for  
example if you were in Revision 3 and you had changes in Revision 1  
that you wanted to go back to, but you'd made changes in Revision 2  
that referenced Revision 1, then you first had to step-back to  
Revision 2 before you could revert back to Revision 1. Even though  
Revision 2 also contained a bunch of changes that you didn't want to  
revert, that you would then manually need to later re-apply. Ug!

Zope 2 also had a Version object, you could poke a button in the UI to  
start a new "transaction" and then start making changes to code 
+content in the database. This was just implemented as a long-running  
transaction - from the point of starting to commiting a transaction  
could sometimes last for a whole month :). The problem being that when  
you finally wanted to commit the transaction to roll-out new features  
on a web site, if there were any conflicts from changes that happened  
you were hosed and would end-up copying those changes into a new  
transaction based off the latest database version and commiting that.  
It wasn't pretty :(

It has long since been acknowledged by Zope developers that exposing  
database level features at the application level is a Bad Thing(TM)!

Today there is a whole plethora of products for Zope that do some form  
of versioning, but they are all implemented at the application level.  
There is a whole plethora of products because there are many ways to  
do versioning, and the choices of how versions are managed is really  
best left up to the specific application. Some of these products  
provide reasonable APIs for implementing specific versioning within a  
specific platform - e.g Plone has a package called plone.app.iterate  
and it has APIs that use standard versioning terminology (checkin,  
checkout, working copy) for example:


class ICheckinCheckoutTool( Interface ):

     def allowCheckin( content ):
         """
         denotes whether a checkin operation can be performed on the  
content.
         """

     def allowCheckout( content ):
         """
         denotes whether a checkout operation can be performed on the  
content.
         """

     def allowCancelCheckout( content ):
         """
         denotes whether a cancel checkout operation can be performed  
on the content.
         """

     def checkin( content, checkin_messsage ):
         """
         check the working copy in, this will merge the working copy  
with the baseline
         """

     def checkout( container, content ):
         """
         """

     def cancelCheckout( content ):
         """





More information about the Biopython mailing list