[Biopython-dev] Pickle problem on 64 bit Windows with Python 3.4

Manlio Calvi manlio.calvi at gmail.com
Tue Apr 22 19:09:54 UTC 2014


Uhm no, I get the same error as before...
Seems this machine don't like conserved vegetables.

Manlio

On Tue, Apr 22, 2014 at 2:36 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Apr 22, 2014 at 12:09 PM, Manlio Calvi <manlio.calvi at gmail.com> wrote:
>> On Tue, Apr 22, 2014 at 12:44 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> On Mon, Apr 21, 2014 at 6:45 PM, Manlio Calvi <manlio.calvi at gmail.com> wrote:
>>>> From what I read here http://hg.python.org/cpython/rev/4a6b8f86b081 could be
>>>> a problem related to that file. Seems to me they stripped the check for a
>>>> quote that must be in, and looking at the pickle apparently isn't
>>>>
>>>
>>> OK, now things are more confusing - this seems to be working on
>>> a colleague's machine, so it may be something different on your
>>> setup. Are you using a self compiled Python 3.4?
>>>
>>> We installed the 64 bit version Python 3.4 on Windows 7 using the
>>> binary installed from the website (Windows x86-64 MSI installer),
>>> selecting for all users (which probably requires admin rights):
>>> https://www.python.org/ftp/python/3.4.0/python-3.4.0.amd64.msi
>>
>> Exactly as I did, I installed the dependencies (numpy and the like)
>> for Biopython using Gohlke's ones.
>>
>>> We manually downloaded the pickle file via the raw link on GitHub,
>>> and tried the test code (as shown below), and it worked perfectly.
>>
>> I've used the standard "git pull" command from the repository.
>> Moreover I'm coming from a recent format and reinstall of windows in
>> this machine.
>> I'm a bit lost here...
>
> OK, I had an idea over lunch which turned out to solve this :)
>
> First I checked that my pickle on Linux file uses Unix new lines,
>
> $ hexdump -C acc_rep_mat.pik | head
> 00000000  28 64 70 31 0a 28 53 27  4c 27 0a 53 27 52 27 0a  |(dp1.(S'L'.S'R'.|
> 00000010  74 49 31 30 39 0a 73 28  53 27 49 27 0a 53 27 49  |tI109.s(S'I'.S'I|
> 00000020  27 0a 74 49 31 34 35 0a  73 28 53 27 51 27 0a 53  |'.tI145.s(S'Q'.S|
> 00000030  27 51 27 0a 74 49 34 32  0a 73 28 53 27 53 27 0a  |'Q'.tI42.s(S'S'.|
> 00000040  53 27 54 27 0a 74 49 31  37 32 0a 73 28 53 27 48  |S'T'.tI172.s(S'H|
> 00000050  27 0a 53 27 54 27 0a 74  49 36 39 0a 73 28 53 27  |'.S'T'.tI69.s(S'|
> 00000060  51 27 0a 53 27 59 27 0a  74 49 34 31 0a 73 28 53  |Q'.S'Y'.tI41.s(S|
> 00000070  27 48 27 0a 53 27 50 27  0a 74 49 32 33 0a 73 28  |'H'.S'P'.tI23.s(|
> 00000080  53 27 4e 27 0a 53 27 59  27 0a 74 49 37 35 0a 73  |S'N'.S'Y'.tI75.s|
> 00000090  28 53 27 48 27 0a 53 27  4c 27 0a 74 49 37 30 0a  |(S'H'.S'L'.tI70.|
>
> Then I converted it to DOS/Windows newlines (e.g. unix2dos
> is easy if you have that, or a few lines of Python if not - see below):
>
> $ hexdump -C acc_rep_mat.dos.pik | head
> 00000000  28 64 70 31 0d 0a 28 53  27 4c 27 0d 0a 53 27 52  |(dp1..(S'L'..S'R|
> 00000010  27 0d 0a 74 49 31 30 39  0d 0a 73 28 53 27 49 27  |'..tI109..s(S'I'|
> 00000020  0d 0a 53 27 49 27 0d 0a  74 49 31 34 35 0d 0a 73  |..S'I'..tI145..s|
> 00000030  28 53 27 51 27 0d 0a 53  27 51 27 0d 0a 74 49 34  |(S'Q'..S'Q'..tI4|
> 00000040  32 0d 0a 73 28 53 27 53  27 0d 0a 53 27 54 27 0d  |2..s(S'S'..S'T'.|
> 00000050  0a 74 49 31 37 32 0d 0a  73 28 53 27 48 27 0d 0a  |.tI172..s(S'H'..|
> 00000060  53 27 54 27 0d 0a 74 49  36 39 0d 0a 73 28 53 27  |S'T'..tI69..s(S'|
> 00000070  51 27 0d 0a 53 27 59 27  0d 0a 74 49 34 31 0d 0a  |Q'..S'Y'..tI41..|
> 00000080  73 28 53 27 48 27 0d 0a  53 27 50 27 0d 0a 74 49  |s(S'H'..S'P'..tI|
> 00000090  32 33 0d 0a 73 28 53 27  4e 27 0d 0a 53 27 59 27  |23..s(S'N'..S'Y'|
>
> This increases the file size from 3658 bytes to 4289 bytes.
>
> $ python3.4 -c "import pickle; h=open('acc_rep_mat.pik', 'rb');
> m=pickle.load(h); h.close(); print(m)"
> {('E', 'M'): 33, ...,  ('D', 'V'): 95}
>
> $ python3.4 -c "import pickle; h=open('acc_rep_mat.dos.pik', 'rb');
> m=pickle.load(h); h.close(); print(m)"
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> _pickle.UnpicklingError: the STRING opcode argument must be quoted
>
> So I can get the exact same error under Linux now :)
>
> I confirmed this on Windows where my copy of git is setup to use
> Unix newlines by default (I think), and the file has Unix newlines
> (and is 3658 bytes).
>
> C:\repositories\biopython\Tests\SubsMat>c:\python34\python
> Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:24:06) [MSC v.1600
> 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> data = open("acc_rep_mat.pik", "rb").read()
>>>> with open("acc_rep_mat.dos.pik", "wb") as h: h.write(data.replace(b"\n", b"\r\n"))
> ...
> 4289
>>>> quit()
>
> C:\repositories\biopython\Tests\SubsMat>c:\python34\python -c  "import
> pickle; h=open('acc_rep_mat.pik', 'rb'); m=pickle.load(h); h.close();
> print(m)"
> {('D', 'R'): 115, ..., ('H', 'Q'): 44}
>
> C:\repositories\biopython\Tests\SubsMat>c:\python34\python -c "import
> pickle; h=open('acc_rep_mat.dos.pik', 'rb'); m=pickle.load(h);
> h.close(); print(m)"
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> _pickle.UnpicklingError: the STRING opcode argument must be quoted
>
> So, the upshot is that this git setting change should fix it:
> https://github.com/biopython/biopython/commit/b7cc2fe199d22f794612d68e5554361413468372
>
> Could you update your copy of the Biopython source code via git,
> and see if that solves this pickle?
>
> Thank you,
>
> Peter



More information about the Biopython-dev mailing list