[Biopython-dev] Pickle problem on 64 bit Windows with Python 3.4
Peter Cock
p.j.a.cock at googlemail.com
Tue Apr 22 12:36:23 UTC 2014
On Tue, Apr 22, 2014 at 12:09 PM, Manlio Calvi <manlio.calvi at gmail.com> wrote:
> On Tue, Apr 22, 2014 at 12:44 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Mon, Apr 21, 2014 at 6:45 PM, Manlio Calvi <manlio.calvi at gmail.com> wrote:
>>> From what I read here http://hg.python.org/cpython/rev/4a6b8f86b081 could be
>>> a problem related to that file. Seems to me they stripped the check for a
>>> quote that must be in, and looking at the pickle apparently isn't
>>>
>>
>> OK, now things are more confusing - this seems to be working on
>> a colleague's machine, so it may be something different on your
>> setup. Are you using a self compiled Python 3.4?
>>
>> We installed the 64 bit version Python 3.4 on Windows 7 using the
>> binary installed from the website (Windows x86-64 MSI installer),
>> selecting for all users (which probably requires admin rights):
>> https://www.python.org/ftp/python/3.4.0/python-3.4.0.amd64.msi
>
> Exactly as I did, I installed the dependencies (numpy and the like)
> for Biopython using Gohlke's ones.
>
>> We manually downloaded the pickle file via the raw link on GitHub,
>> and tried the test code (as shown below), and it worked perfectly.
>
> I've used the standard "git pull" command from the repository.
> Moreover I'm coming from a recent format and reinstall of windows in
> this machine.
> I'm a bit lost here...
OK, I had an idea over lunch which turned out to solve this :)
First I checked that my pickle on Linux file uses Unix new lines,
$ hexdump -C acc_rep_mat.pik | head
00000000 28 64 70 31 0a 28 53 27 4c 27 0a 53 27 52 27 0a |(dp1.(S'L'.S'R'.|
00000010 74 49 31 30 39 0a 73 28 53 27 49 27 0a 53 27 49 |tI109.s(S'I'.S'I|
00000020 27 0a 74 49 31 34 35 0a 73 28 53 27 51 27 0a 53 |'.tI145.s(S'Q'.S|
00000030 27 51 27 0a 74 49 34 32 0a 73 28 53 27 53 27 0a |'Q'.tI42.s(S'S'.|
00000040 53 27 54 27 0a 74 49 31 37 32 0a 73 28 53 27 48 |S'T'.tI172.s(S'H|
00000050 27 0a 53 27 54 27 0a 74 49 36 39 0a 73 28 53 27 |'.S'T'.tI69.s(S'|
00000060 51 27 0a 53 27 59 27 0a 74 49 34 31 0a 73 28 53 |Q'.S'Y'.tI41.s(S|
00000070 27 48 27 0a 53 27 50 27 0a 74 49 32 33 0a 73 28 |'H'.S'P'.tI23.s(|
00000080 53 27 4e 27 0a 53 27 59 27 0a 74 49 37 35 0a 73 |S'N'.S'Y'.tI75.s|
00000090 28 53 27 48 27 0a 53 27 4c 27 0a 74 49 37 30 0a |(S'H'.S'L'.tI70.|
Then I converted it to DOS/Windows newlines (e.g. unix2dos
is easy if you have that, or a few lines of Python if not - see below):
$ hexdump -C acc_rep_mat.dos.pik | head
00000000 28 64 70 31 0d 0a 28 53 27 4c 27 0d 0a 53 27 52 |(dp1..(S'L'..S'R|
00000010 27 0d 0a 74 49 31 30 39 0d 0a 73 28 53 27 49 27 |'..tI109..s(S'I'|
00000020 0d 0a 53 27 49 27 0d 0a 74 49 31 34 35 0d 0a 73 |..S'I'..tI145..s|
00000030 28 53 27 51 27 0d 0a 53 27 51 27 0d 0a 74 49 34 |(S'Q'..S'Q'..tI4|
00000040 32 0d 0a 73 28 53 27 53 27 0d 0a 53 27 54 27 0d |2..s(S'S'..S'T'.|
00000050 0a 74 49 31 37 32 0d 0a 73 28 53 27 48 27 0d 0a |.tI172..s(S'H'..|
00000060 53 27 54 27 0d 0a 74 49 36 39 0d 0a 73 28 53 27 |S'T'..tI69..s(S'|
00000070 51 27 0d 0a 53 27 59 27 0d 0a 74 49 34 31 0d 0a |Q'..S'Y'..tI41..|
00000080 73 28 53 27 48 27 0d 0a 53 27 50 27 0d 0a 74 49 |s(S'H'..S'P'..tI|
00000090 32 33 0d 0a 73 28 53 27 4e 27 0d 0a 53 27 59 27 |23..s(S'N'..S'Y'|
This increases the file size from 3658 bytes to 4289 bytes.
$ python3.4 -c "import pickle; h=open('acc_rep_mat.pik', 'rb');
m=pickle.load(h); h.close(); print(m)"
{('E', 'M'): 33, ..., ('D', 'V'): 95}
$ python3.4 -c "import pickle; h=open('acc_rep_mat.dos.pik', 'rb');
m=pickle.load(h); h.close(); print(m)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
_pickle.UnpicklingError: the STRING opcode argument must be quoted
So I can get the exact same error under Linux now :)
I confirmed this on Windows where my copy of git is setup to use
Unix newlines by default (I think), and the file has Unix newlines
(and is 3658 bytes).
C:\repositories\biopython\Tests\SubsMat>c:\python34\python
Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:24:06) [MSC v.1600
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> data = open("acc_rep_mat.pik", "rb").read()
>>> with open("acc_rep_mat.dos.pik", "wb") as h: h.write(data.replace(b"\n", b"\r\n"))
...
4289
>>> quit()
C:\repositories\biopython\Tests\SubsMat>c:\python34\python -c "import
pickle; h=open('acc_rep_mat.pik', 'rb'); m=pickle.load(h); h.close();
print(m)"
{('D', 'R'): 115, ..., ('H', 'Q'): 44}
C:\repositories\biopython\Tests\SubsMat>c:\python34\python -c "import
pickle; h=open('acc_rep_mat.dos.pik', 'rb'); m=pickle.load(h);
h.close(); print(m)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
_pickle.UnpicklingError: the STRING opcode argument must be quoted
So, the upshot is that this git setting change should fix it:
https://github.com/biopython/biopython/commit/b7cc2fe199d22f794612d68e5554361413468372
Could you update your copy of the Biopython source code via git,
and see if that solves this pickle?
Thank you,
Peter
More information about the Biopython-dev
mailing list