[BioPython] [bip] [OT] Revision control and databases

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Fri Oct 24 09:08:54 UTC 2008


On Thu, Oct 23, 2008 at 3:55 PM, Bruce Southey <bsouthey at gmail.com> wrote:

> Giovanni Marco Dall'Olio wrote:
>
>> Hi,
>> I have a question (well, it's not directly related to biopython or pygr,
>> but to scientific computing).
>>
>> I always used flat files to store results and data for my bioinformatics
>> analys, but not (as I was saying in another thread) I would like to start
>> using a database to do that.
>>
> Of course Biopython's BioSQL interface may provide a starting point.


The problem is that BioSQL doesn't support yet Population Genetics record
(see another thread in biopython mailing list), so I would have to implement
something like that in BioSQL or wait for the developers to do it.
Maybe I will do this later, but now I don't have the time.


>
>  The problem is I don't know if databases do Revision Control.
>> When I used flat files, I was used to save all the results in a git
>> repository, and, everytime something was changed or calculated again, I did
>> commit it.
>> Do you know how to do this with databases? Does MySQL provide support for
>> revision control?
>> Thanks :)
>>
> I think you are asking the wrong questions because it depends on what you
> want to do and what you actually store. There are a number of questions that
> you need to ask yourself about what you really need to do (knowing you have
> used git helps refine these). Examples include:
> How often do you use the old versions in your git repository?
> How do you use the old revisions in your git repository?
> Do you even use the information of an older version if a newer version
> exists?
> How many users that can make changes?
> How often do you have conflicts?
> Are the conflicts hard to solve?


These are all very good questions.
The problem is that I consider revision control as a 'good practice': I
remember that when I was not used to keep an history of the changes to my
data, it was a mess. I would like to have at least a 'version' field, to
know how much my data is old.

I have found this :
- http://pgfoundry.org/projects/tablelog/
which seems interesting.
I think this is a big issue for bioinformatics. How is it possible that
nobody has never tried to implement such a functionality for databases?
Version Control could be difficult to implement, but not so much. There is
must be something that I can reuse...



Do you actually determine when 'something was changed or calculated again'
> or it this partly determined by an external source like a Genbank or UniProt
> update? (At least in a database approach you could automate this.)


Well, it could be useful to


>
>
> Revision control may be overkill for your use because this is aims to
> handle many tasks and change conflicts related to multiple users rather than
> a single user.  If you don't need all these fancy features then you can use
> a database. If you just want to store and retrieve a version then you can
> use a database but you need to at least force the inclusion a date and
> comment fields to be useful.




Maybe there are other similar tools.
This is a big issue for bioinformatics. I think it is a good, when working
with

Unfortunately I think revision control would be very useful for me.
The data in the database will be used and uploaded by 4 or 5 people.
It will be used also to store the results from some script:


>
>
>
> Regards
> Bruce
>


Thank you very much for all the replies.. I didn't expect so many of them.


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it



More information about the Biopython mailing list