[Biopython-dev] [Biopython - Bug #3403] PDBList fails to download large PDB structures
redmine at redmine.open-bio.org
redmine at redmine.open-bio.org
Wed Jan 9 23:08:28 UTC 2013
Issue #3403 has been updated by David Cain.
(Pull request "here":https://github.com/biopython/biopython/pull/146)
----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403
Author: David Cain
Status: New
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version:
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl
The current @PDBList@ module will often fail to download large PDB files.
<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
out.writelines(gz.read())
File "/usr/lib/python2.7/gzip.py", line 249, in read
self._read(readsize)
File "/usr/lib/python2.7/gzip.py", line 303, in _read
self._read_eof()
File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>
The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.
I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
More information about the Biopython-dev
mailing list