[Biopython-dev] test_AlignIO to python 3

Peter Cock p.j.a.cock at googlemail.com
Mon Jul 5 11:01:42 UTC 2010


2010/7/5 Tiago Antão <tiagoantao at gmail.com>:
> Hi,
>
> test_AlignIO provides a far more interesting case (but not
> complicated, not at all).

Or just test_seq.py or test_Seq_objs.py which are more low level ;)

> The issues are as follows:
>
> 1. list sorting
> Bio.Data.CodonTable has a:
> possible.sort(_sort)
> Py3 has no compare function (that _sort is a 5 line function defined
> just above). That can be "forced in", but there is normally a simpler
> dialect, with keywords. The line above becomes:
> if sys.version_info[0] == 3:
>            possible.sort(key=lambda x:self.ambiguous_protein[x])
> else:
>            possible.sort(_sort)

I think Python 2.4 added support for the key argument, so can we
just unconditionally change it to:

possible.sort(key=lambda x:self.ambiguous_protein[x])

However, that isn't doing quite the same thing. The old sort was by
table length first to try and get the least ambiguous mapping or
something like that... we probably need some more unit tests first.

> 2. Strings and bytes
> Bio.Seq requires
>    if sys.version_info[0] == 3 :
>        return str.maketrans(before, after)
>    else:
>        return string.maketrans(before, after)

This is within our private _maketrans function only? That looks sensible
but I wonder why 2to3 doesn't handle this on its own.

Would moving the "import string" into the function help for clarity?

def _maketrans(complement_mapping):
    """Makes a python string translation table (PRIVATE)."""
    before = ''.join(complement_mapping.keys())
    after = ''.join(complement_mapping.values())
    before = before + before.lower()
    after = after + after.lower()
    if sys.version_info[0] == 3 :
        return str.maketrans(before, after)
    else:
        import string
        return string.maketrans(before, after)

> The way p3 handles strings and bytes are the biggest issue that I
> think we will face from a technical perspective.

I agree that strings vs bytes will be an issue for us (potentially from
a memory point of view for Seq objects).

> 3. The big one: No sgmllib in p3.
>   The obvious solution is to include it (I suppose the licenses are
> compatible?). The alternative (using htmllib) might be more long-term,
> in my opinion

A lot of the things using sgmllib are already deprecated (e.g.
Bio.NetCatch and Bio.Prosite). I think that leaves just Bio.UniGene
and Bio.InterPro - which isn't such a big issue.

Peter




More information about the Biopython-dev mailing list