[Biopython-dev] Calling 2to3 from setup.py

Peter biopython at maubp.freeserve.co.uk
Tue Jan 4 23:30:29 UTC 2011


On Tue, Jan 4, 2011 at 10:43 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> Something we've talked about before is calling lib2to3 or the 2to3 script
> from within setup.py to make installing Biopython simpler on Python 3.
>
> ...
>
> I then looked at how NumPy are doing this, and they have a hook
> in setup.py which calls their own Python script called py3tool.py to
> do the conversion, ... it will not bother to reconvert previously
> converted but unchanged files. ...
>
> However, this only seems to be a simple check based on the file
> timestamps. I worry that you'd have to clear the converted files
> to ensure a clean rebuild after switching branches - but from
> a brief search online it looks like git will give modified files the
> current time stamp when you do a checkout.

For an interesting but heated discussion of this and related issues,
see this thread: http://www.spinics.net/lists/git/msg24579.html
The key point is that although some version control systems do
have an option to restore time stamps, git does not. Thus if you
switch branches, and changed file gets the current timestamp.
This is simple, and ensures simple build systems like make will
rebuild all dependencies (but may do unnecessary work).

> On the following branch I've followed the same basic strategy as
> NumPy - handle the 2to3 conversion with a script and then before
> calling the setup function switch to the converted source tree.
> The main difference is I also track the md5 checksums of the
> source files and the 2to3 converted python scripts. Perhaps it
> is over engineered, but it seems safer than looking at the files'
> time stamps?

If you are on the master branch, then checkout another branch,
then checkout the master branch again, the net result with git is
any files which differed between the two branches would have
had their time stamp updated (but with no net change to their
contents). Using the NumPy setup.py script this would trigger a
needless reconversion of those files with 2to3. Using the md5
approach would not do this extra work.

On the other hand, this example is contrived - in practice when
I change branch I want to build/install and test that code. So
on reflection, using the time stamp to decide if 2to3 needs to
be rerun is probable quite sufficient (and will be faster too).

Peter



More information about the Biopython-dev mailing list