[Biopython-dev] 2to3 ramblings

Tiago Antão tiagoantao at gmail.com
Sun Jul 4 20:24:42 UTC 2010


Hi,

Here are my findings on the attempt of converting biopython to python 3.
What I did:
1. Tried to convert Bio (not BioSQL)
2. No C code
3. No external apps

No external apps just because I don't most of the around here.

Things are going much faster than expected 52 out of 144 tests are
failing. Less than 6 hour work tothis. With the exception of sff
processing I chosed the most complicated that I've found (many of the
existing failing tests are of the easy kind)

Some general issues that I am finding that impact us:
1. import exception is no more
2. Many lists are now iterators (e.g. map results)
3. 2to3 of course is not complete. Also sometimes there are some small
mistakes (things one would expect to convert that are not)
4. sgmlib is no more. 2 options: include it (from python 2.6, which I
am doing) OR use htmllib.
5. slices [:], have to be ints (which is mildly problematic with the
fact that division is now float). Thus
myPos = x/2
x[myPos:]
has to become
myLen = int(x/2)
6. Doctests have to be converted (2to3 does it)
7. Default open is now non-binary, so open sometimes requires rb. file
is no more
8. Many order functions do not accept None e.g max([None,1,2]) will fail
9. StringType, *Type are no more
10. sort has no cmp function anymore
11. urllib namespace refactored
12. unit tests really help!

13!!!: The biggest problem has been bytes versus strings and
encodings. Most existing complex problems are about this

Biggest issues have been with Nexus and, above all, Sff (mostly 13
above - encoding formats).

With the exception of Sff, I think I could easily sort out everything myself.

The big incognito seems to be the C code. But I will assume that
conversion is easy for the rest of the discussion. I have also to test
process code that executes external apps.


>From my point of view the conversion is not the big issue. The big
issue is the maintenance of a version that works on both 2 and 3 at
the same time (we dont want to maintain 2 codebases, correct?).
Somethings are easy, but some are unknowns. It is possible to make
_some_ code (that currently works only on 2) work on both pythons with
little effort. Other code (e.g. prints) can be automatically converted
on build. But some issues are still unknown to me.

What numpy does (at least partially) is, on build: if python 3 is
detected then call 2to3 to convert a python2 codebase to python3.
Seems to work quite well. My gut feeling is that code of the form
if python.version==2:
   a_version
else:
   b_version
can be almost non-existent.
But it is just a gut feeling.

So I think the python codebase can be easily shared between python 2
and 3 with little ugliness. About the C codebase? I don' t have any
idea for now.

This is not as much work as it seems. I think it is possible to have
almost everything working on python3 for BOSC (assuming the current
pace). But again, the main issue is not the conversion but maintaining
a single code base. In practice, I think the first step is to have a
build system like numpy: which detects the python version and calls
2to3. A single code base that can be built and tested on both 2 and 3.


Suggested readings
http://coderazzi.net/tnotes/python/migrating2to3.html
http://diveintopython3.org/porting-code-to-python-3-with-2to3.html
http://dbaktiar.wordpress.com/2009/08/20/python-3-1-file-open-is-no-longer-binary-by-default/


Well, these are my 0.02£. I can work on putting a github version of
this if you are interested...

-- 
"If you want to get laid, go to college.  If you want an education, go
to the library." - Frank Zappa




More information about the Biopython-dev mailing list