[Biopython-dev] Calling 2to3 from setup.py

Peter biopython at maubp.freeserve.co.uk
Tue Jan 4 22:43:27 UTC 2011


Hi all,

Something we've talked about before is calling lib2to3 or the 2to3 script
from within setup.py to make installing Biopython simpler on Python 3.

Also, our current arrangement where we recommend calling 2to3 in
situ is not very helpful from a source code control point of view - it
makes working on Python 3 specific fixes rather fiddly.

If we didn't need any special arguments for calling 2to3 we could try
this simple solution:

try:
   from distutils.command.build_py import build_py_2to3 as build_py
except ImportError:
   from distutils.command.build_py import build_py

then add this to the setup function call,

cmdclass = {'build_py': build_py}

See http://docs.python.org/py3k/distutils/apiref.html and other pages.

However, as far as I can see, that doesn't cater for passing in options
like disabling the long fixer (which we require).

I then looked at how NumPy are doing this, and they have a hook
in setup.py which calls their own Python script called py3tool.py to
do the conversion, then change the current directory to the converted
code before calling the setup function. See:
https://github.com/numpy/numpy/blob/master/setup.py
https://github.com/numpy/numpy/blob/master/tools/py3tool.py

The NumPy py3tool.py script has some brains - it will not bother
to reconvert previously converted but unchanged files. Since 2to3
is quite slow this is important, e.g. for doing:

python3 setup.py build
python3 setup.py test
python3 setup.py install

However, this only seems to be a simple check based on the file
timestamps. I worry that you'd have to clear the converted files
to ensure a clean rebuild after switching branches - but from
a brief search online it looks like git will give modified files the
current time stamp when you do a checkout.

On the following branch I've followed the same basic strategy as
NumPy - handle the 2to3 conversion with a script and then before
calling the setup function switch to the converted source tree.
The main difference is I also track the md5 checksums of the
source files and the 2to3 converted python scripts. Perhaps it
is over engineered, but it seems safer than looking at the files'
time stamps?

https://github.com/peterjc/biopython/tree/py3setup

I haven't tried this yet on Windows or Mac, just Linux with
Python 3.1 for now.

Another potential issue with the NumPy code is it doesn't worry
about the Python 3.1 and 3.2 (etc) versions of 2to3 giving slightly
different results. To be safe, I'm using a separate build folder for
each. If you run setup.py under Python 3.x, it calls lib2to3 from
that Python.

Has anyone else looked at this?

Peter



More information about the Biopython-dev mailing list