[Biopython-dev] Python 3 and encoding for online resources

Peter biopython at maubp.freeserve.co.uk
Mon Aug 2 10:04:49 EDT 2010


On Mon, Aug 2, 2010 at 2:50 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> Or if you just want to grab some code for a quick play,
>>I have a branch where I've been doing this on a
>> semi-regular basis:
>>
>> http://github.com/peterjc/biopython/tree/auto2to3
>
> Thanks! I used this branch to test the Bio.Entrez and Bio.SwissProt parsers.
> The Bio.Entrez Parser works as is; the Bio.SwissProt parser is really easy to
> fix (just convert each line into a plain string inside the _read function in
> Bio.SwissProt.__init__). Perhaps we can do something similar for the other
> test_SeqIO_online.py failures (the ones appearing in Bio/SeqIO/FastaIO.py)?

Maybe (replied in more detail below)

>> > So I'd suggest to not use File.UndoHandle (at all),
>> ...
>> I disagree. The NCBI return multiple different file
>> formats, so there are multiple different parsers that may get
>> an error page.
>>
>> Given the NCBI return HTML error pages regardless of what
>> format the request was (XML, plain text, etc), I think we
>> have to look for errors before giving the data to the
>> parser.
>
> Part of the problem solves itself when we change to Python 3. In Python
> 3, urllib.request.urlopen raises a urllib.error.HTTPError in cases where
> urllib.urlopen in Python 2 raises no exception:
>
> ...
>
> So I would suggest to switch from urllib to urllib2 in Bio.Entrez and catch
> any HTTP errors (urllib2 is translated appropriately by 2to3),

That sounds very sensible.

> ... and to handle any bytes/utf8/ascii conversion inside the parser
> (as in Bio.SwissProt).

i.e. Make the SwissProt, FASTA, etc parsers cope with unicode string
handles (default from open on Python 3) and bytes handles (network
handles or from file open in binary mode)? I think this is probably a
worthwhile thing to do in any case, especially for the indexing code, see:
http://lists.open-bio.org/pipermail/biopython-dev/2010-July/008011.html

Peter


More information about the Biopython-dev mailing list