[Biopython-dev] Python 2 and 3 migration thoughts

Peter Cock p.j.a.cock at googlemail.com
Sat Sep 7 11:30:50 UTC 2013


On Fri, Sep 6, 2013 at 4:44 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, May 30, 2013 at 2:33 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> Short term we will continue with developing using Python 2
>> syntax and running 2to3 for Python 3. As far as I know,
>> the reverse process with 3to2 is not well established. If
>> anyone wants to investigate that would be useful as
>> another option. However, dropping Python 2.5 support
>> makes things more flexible...
>>
>> Medium term I believe it would be possible to have a single
>> code base which is both valid Python 2 and 3 at the same
>> time. This may require us to target 2.7 and 3.3+ only - we'll
>> have to try it and see if Python 2.6 will hold us back.
>>
>> I've actually done this with lzma.backports, a small but
>> non-trivial module with Python and C code:
>>
>> https://pypi.python.org/pypi/backports.lzma/
>> https://github.com/peterjc/backports.lzma
>>
>> Python 3.3 reintroduces some features designed to make
>> this more straightforward, like unicode literals (missing in
>> the early versions of Python 3). This is why I'd like to drop
>> Python 3.2 as soon as possible.
>>
>> What I was thinking is we can start migrating modules on a
>> case by case basis from "Python 2 syntax" to "Dual syntax"
>> one by one, with a white-list in the do2to3.py script. That
>> way over time less and less modules need to be converted
>> via 2to3, and "python3 setup.py install" will get faster,
>> until eventually we can stop using 2to3 at all.
>>
>> This conversion could consider the code and doctests
>> separately. However, using using print(example) we can
>> hopefully get most of the doctests and Tutorial examples
>> to work under both Python 2 and 3 at the same time.
>>
>> That's my current thinking anyway - and I think the fact
>> that it would be a gradual migration from writing Python 2
>> specific code to writing dual 2/3 code makes it low risk
>> (as long as we're continuing to run regular testing).
>>
>> Regards,
>>
>> Peter
>
> This branch is trying out marking individual Python files
> as dual coding (Python 2 and Python 3) or as Python 2
> only requiring conversion via 2to3 for use on Python 3:
>
> https://github.com/peterjc/biopython/tree/tag2to3
>
> Currently the tags are two special hash comment lines
> expected near the start of the file itself (rather than a
> list within the do2to3.py script). The actual text of the
> marker isn't critical - perhaps these need full stops?
>
> # This file targets both Python 2 and Python 3 at the same time
> # TODO - Targets Python 2 only (use 2to3 to run under Python 3)
>
> The first main issues thus far have been print statements,
> where we will either need to use the __future__ import or
> restrict ourselves to simple single argument calls - I have
> been using the later. This should not be a big problem on the
> main code, and we ought to update the print-and-compare
> unit tests anyway,

e.g.
https://github.com/biopython/biopython/commit/6fa766e2348eae4e083503885f4ea5b66f531d7a

> The next common issue is import statements, for
> example StringIO (another bytes versus unicode issue).
> That can be handled via Bio._py3k in some cases.

For StringIO,
https://github.com/biopython/biopython/commit/b09ebbf6f8c4032f874d89a91d199d8697c2d381

For commands.getoutput used in many tests,
https://github.com/biopython/biopython/commit/11a1eca60e7a1491dbe54204ad3103e013bfebc5

> A third major class of issues in the unit tests so
> far is iterators versus lists, for example dictionary
> methods and the map function's return value. These
> can be tackled on a case by case basis I think - often
> by adding the occasional list(...) or sorted(x) instead
> of trying x.sorted() is enough.

e.g. for sorting dictionary keys,
https://github.com/biopython/biopython/commit/b27f30012af6e66f6f143ecde719bf72609af8f2

e.g. for avoiding iterators from map function,
https://github.com/biopython/biopython/commit/730850e3f4e88a70860e56abafbb579b25414f06

> There are also quite a few instances of 'basestring'
> which might be handled via _py3k?
>
> As of right now, on this branch there are only 8 files under
> Tests which require conversion via 2to3 :

Down to six files under Tests now if I rebase the branch
to include the recent fixes on the master.

> Having I hope demonstrated this will work, I'd like some
> feedback before applying this (or a modified version of
> it) to the master branch.

I've started applying individual code fixes to the master
to improve Python 2 and 3 compatibility already.

I'm specifically looking for thoughts on how to handle
the transition period when some of our code will still
need 2to3, while other code will not.

Does the special comment line seem like a good solution?
On the plus side, it tracks any changes with the file being
updated (which wouldn't happen with a list in the do2to3.py
file).

Peter



More information about the Biopython-dev mailing list