[Biopython-dev] Restriction analysis package.

fsms at users.sourceforge.net fsms at users.sourceforge.net
Thu May 27 07:20:44 EDT 2004


Hello,

Sorry, about the delay, Id to go to a job interview and had not easy
access to the net.
//
> Thanks for putting this together. The code looks very useful and I'd
> definitely like to see it work towards being included in Biopython,
> if that's what you'd like. A few comments on it:
> 
> 1. First, if you'd like to include this in Biopython the code would
> have to be willing to license the code under the Biopython license.
> I see different references to the GPL and Python license within your
> package. I'm not at all the type of person who argues about
> licensing issues, but we just need to keep the Biopython
> distribution under one license.
> 

Obviously, I will put the code under Biopython license. I put it under
Python for the time being knowing that some people don't like to read
GPL code.


> 2. The way this is organized right now puts two different types of
> functionality together -- building the enzyme dictionary by
> downloading and parsing Rebase, and the actual enzyme dictionary
> itself. For Biopython, the public functionality you'd want to expose
> would be the enzyme dictionary and the useful functions you have
> within that. The downloading and parsing work would be something
> that you, or another developer, would do on a monthly or whatever
> basis to keep the enzyme dictionary up to date within Biopython.
> Thus I'd propose organizing the code like:
> 
> Bio/Restriction/__init__.py --> The current Restriction.py
> Bio/Restriction/Restriction_Dictionary.py --> the dictionary
> Bio/Restriction/_Update/ --> The Update, RanaConfig and 
> RestrictionCompiler code to do the updates and regenerate the
> dictionary.

yes and no. I agree with the organisation of the code and
I would effectively update the dictionary in Biopython but I think it is
important for the end user to be able to update the dictionary on their 
machine
without downloading the full distribution, so this is also a public 
functionality.

Something I would like to implement as well when I have the time is a 
switch
in the ranacompiler.py script to pre-select the supplier(s). If one 
organisation
gets its enzymes from supplier A and B, they may wish not to search the
sequences for enzymes of supplier C.

> ranacompiler.py should exist in somewhere like Scripts/restriction
> to be run, instead of in site-packages.
> 

Yes, the organisation of the package I submitted was quick and dirty to
assess if you were interested. I will write a setup.py for the package
which will allow the modification.

> 3. Going along with reorganizing the code base, I'd propose changing
> the updating scripts a bit. Storing databases and things into
> site-packages is generally not a good idea, since that is meant
> for Python code, and also requires the user to mess around with
> either running scripts as root or changing permissions -- more work
> then is really necessary. What I'd do is store the Database and
> Updates information into, say, the current directory where the user
> runs the scripts. Additionally, the Restriction_Dictionary.py would
> be generated there. Then, when the updates are done everything gets
> run and you have a new Restriction_Dictionary.py to copy over and
> check into CVS.
> 

Well, as before I am also in favour of putting the databases, scripts and
so on in another directory than site-packages, this was a shortcut.

But I am not sure I understand what you propose.
The first point is who we want to do the update :

1 ) Should it be done in a centralised way, i.e. in Biopython, and 
people get the
	update when they update their CVS. Which means they use CVS for their 
Biopython
	installation and that people getting Biopython from the release system 
rather than
	CVS will not get frequent updates of the enzyme dictionary.
	This might not be a problem since Rebase does not change that quickly 
at least
	for the most usual enzymes.

2) Another way is to propose an admin scheme. The administrator of the 
box is in
	charge of keeping the enzyme dictionary up to date. Then we must provide a
	script to do that easily. We can then install all the data into a 
centralised
	directory something like /var/Biopython/Restriction/
	In this case, the script would be run as root when the updates are done 
and
	the enzyme dictionary is installed into site-packages since it is a python
	script after all.

3) The third way is to propose a scheme where everybody can make the update.
	the directory in which everything is stored is then a 
/home/user/Biopython/Restriction.
	The enzyme dictionary is kept and run from the user home directory. 
There is no problem
	of permission, since the enzyme will be accessible to the user. Each 
user will run its
	personnal version of the enzyme dictionary that will be kept in its own 
directory.
	This means Restriction is installed centrally in 
site-packages/Biopython/Restriction but
	the enzyme dictionary is not installed when Biopython is installed. The 
first time a new user
	run the package, it get to update the dictionary.
	The script Restriction_Dictionary.py is never installed into site-package.

4) The fourth solution is the current directory scheme. I am personnaly 
not very
	keen on this one. My worry with this scheme is that, on machines
	that are used by several persons, this will ultimately finish by 
installing several times
	the same information in different places. That could well be ok on *nix 
boxes,
	which restrict what a user can do, but on windows...
	This will end up into a mess and the scripts are more likely to break
	if you have several installations of the enzyme dictionary.
	Another solution here is to use temporary files.

My personnal preference would go to a mix of the first and second solution,
but I am open to discuss it further.

Does not Biopython have a centralised way to keep data centralised ?
Something like /var/Biopython. I am sure that this package is not the 
only one
which could benefit from such facility.


> Hopefully these make some sense. I really like the catalyse and
> search functionality on the enzyme classes -- it's a nice interface
> design and it would be great to have in Biopython.
> Please do let me know what you think about the licensing and change
> proposals and we can keep moving forward towards getting this in
> Biopython. Thanks again for the work so far!
> 
> Brad

I have had some time when I was away to test a bit further the 
Restriction package.
I have a class to add which allow analysis (i.e where you can specify 
things as
only blunt, or enzymes which cut twice...).

I will do the modif over the weekend. (i.e put it under biopython 
license for a
start). The remaining will need a bit more time.

Fred










More information about the Biopython-dev mailing list