[Biopython-dev] Pickle problem on 64 bit Windows with Python 3.4
Manlio Calvi
manlio.calvi at gmail.com
Tue Apr 22 15:09:54 EDT 2014
Uhm no, I get the same error as before...
Seems this machine don't like conserved vegetables.
Manlio
On Tue, Apr 22, 2014 at 2:36 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Apr 22, 2014 at 12:09 PM, Manlio Calvi <manlio.calvi at gmail.com> wrote:
>> On Tue, Apr 22, 2014 at 12:44 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> On Mon, Apr 21, 2014 at 6:45 PM, Manlio Calvi <manlio.calvi at gmail.com> wrote:
>>>> From what I read here http://hg.python.org/cpython/rev/4a6b8f86b081 could be
>>>> a problem related to that file. Seems to me they stripped the check for a
>>>> quote that must be in, and looking at the pickle apparently isn't
>>>>
>>>
>>> OK, now things are more confusing - this seems to be working on
>>> a colleague's machine, so it may be something different on your
>>> setup. Are you using a self compiled Python 3.4?
>>>
>>> We installed the 64 bit version Python 3.4 on Windows 7 using the
>>> binary installed from the website (Windows x86-64 MSI installer),
>>> selecting for all users (which probably requires admin rights):
>>> https://www.python.org/ftp/python/3.4.0/python-3.4.0.amd64.msi
>>
>> Exactly as I did, I installed the dependencies (numpy and the like)
>> for Biopython using Gohlke's ones.
>>
>>> We manually downloaded the pickle file via the raw link on GitHub,
>>> and tried the test code (as shown below), and it worked perfectly.
>>
>> I've used the standard "git pull" command from the repository.
>> Moreover I'm coming from a recent format and reinstall of windows in
>> this machine.
>> I'm a bit lost here...
>
> OK, I had an idea over lunch which turned out to solve this :)
>
> First I checked that my pickle on Linux file uses Unix new lines,
>
> $ hexdump -C acc_rep_mat.pik | head
> 00000000 28 64 70 31 0a 28 53 27 4c 27 0a 53 27 52 27 0a |(dp1.(S'L'.S'R'.|
> 00000010 74 49 31 30 39 0a 73 28 53 27 49 27 0a 53 27 49 |tI109.s(S'I'.S'I|
> 00000020 27 0a 74 49 31 34 35 0a 73 28 53 27 51 27 0a 53 |'.tI145.s(S'Q'.S|
> 00000030 27 51 27 0a 74 49 34 32 0a 73 28 53 27 53 27 0a |'Q'.tI42.s(S'S'.|
> 00000040 53 27 54 27 0a 74 49 31 37 32 0a 73 28 53 27 48 |S'T'.tI172.s(S'H|
> 00000050 27 0a 53 27 54 27 0a 74 49 36 39 0a 73 28 53 27 |'.S'T'.tI69.s(S'|
> 00000060 51 27 0a 53 27 59 27 0a 74 49 34 31 0a 73 28 53 |Q'.S'Y'.tI41.s(S|
> 00000070 27 48 27 0a 53 27 50 27 0a 74 49 32 33 0a 73 28 |'H'.S'P'.tI23.s(|
> 00000080 53 27 4e 27 0a 53 27 59 27 0a 74 49 37 35 0a 73 |S'N'.S'Y'.tI75.s|
> 00000090 28 53 27 48 27 0a 53 27 4c 27 0a 74 49 37 30 0a |(S'H'.S'L'.tI70.|
>
> Then I converted it to DOS/Windows newlines (e.g. unix2dos
> is easy if you have that, or a few lines of Python if not - see below):
>
> $ hexdump -C acc_rep_mat.dos.pik | head
> 00000000 28 64 70 31 0d 0a 28 53 27 4c 27 0d 0a 53 27 52 |(dp1..(S'L'..S'R|
> 00000010 27 0d 0a 74 49 31 30 39 0d 0a 73 28 53 27 49 27 |'..tI109..s(S'I'|
> 00000020 0d 0a 53 27 49 27 0d 0a 74 49 31 34 35 0d 0a 73 |..S'I'..tI145..s|
> 00000030 28 53 27 51 27 0d 0a 53 27 51 27 0d 0a 74 49 34 |(S'Q'..S'Q'..tI4|
> 00000040 32 0d 0a 73 28 53 27 53 27 0d 0a 53 27 54 27 0d |2..s(S'S'..S'T'.|
> 00000050 0a 74 49 31 37 32 0d 0a 73 28 53 27 48 27 0d 0a |.tI172..s(S'H'..|
> 00000060 53 27 54 27 0d 0a 74 49 36 39 0d 0a 73 28 53 27 |S'T'..tI69..s(S'|
> 00000070 51 27 0d 0a 53 27 59 27 0d 0a 74 49 34 31 0d 0a |Q'..S'Y'..tI41..|
> 00000080 73 28 53 27 48 27 0d 0a 53 27 50 27 0d 0a 74 49 |s(S'H'..S'P'..tI|
> 00000090 32 33 0d 0a 73 28 53 27 4e 27 0d 0a 53 27 59 27 |23..s(S'N'..S'Y'|
>
> This increases the file size from 3658 bytes to 4289 bytes.
>
> $ python3.4 -c "import pickle; h=open('acc_rep_mat.pik', 'rb');
> m=pickle.load(h); h.close(); print(m)"
> {('E', 'M'): 33, ..., ('D', 'V'): 95}
>
> $ python3.4 -c "import pickle; h=open('acc_rep_mat.dos.pik', 'rb');
> m=pickle.load(h); h.close(); print(m)"
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> _pickle.UnpicklingError: the STRING opcode argument must be quoted
>
> So I can get the exact same error under Linux now :)
>
> I confirmed this on Windows where my copy of git is setup to use
> Unix newlines by default (I think), and the file has Unix newlines
> (and is 3658 bytes).
>
> C:\repositories\biopython\Tests\SubsMat>c:\python34\python
> Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:24:06) [MSC v.1600
> 32 bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> data = open("acc_rep_mat.pik", "rb").read()
>>>> with open("acc_rep_mat.dos.pik", "wb") as h: h.write(data.replace(b"\n", b"\r\n"))
> ...
> 4289
>>>> quit()
>
> C:\repositories\biopython\Tests\SubsMat>c:\python34\python -c "import
> pickle; h=open('acc_rep_mat.pik', 'rb'); m=pickle.load(h); h.close();
> print(m)"
> {('D', 'R'): 115, ..., ('H', 'Q'): 44}
>
> C:\repositories\biopython\Tests\SubsMat>c:\python34\python -c "import
> pickle; h=open('acc_rep_mat.dos.pik', 'rb'); m=pickle.load(h);
> h.close(); print(m)"
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> _pickle.UnpicklingError: the STRING opcode argument must be quoted
>
> So, the upshot is that this git setting change should fix it:
> https://github.com/biopython/biopython/commit/b7cc2fe199d22f794612d68e5554361413468372
>
> Could you update your copy of the Biopython source code via git,
> and see if that solves this pickle?
>
> Thank you,
>
> Peter
More information about the Biopython-dev
mailing list