[Biopython-dev] [Biopython - Bug #3403] PDBList fails to download large PDB structures

Wed Jan 9 23:08:28 UTC 2013

Issue #3403 has been updated by David Cain.

(Pull request "here":https://github.com/biopython/biopython/pull/146)
----------------------------------------
Bug #3403: PDBList fails to download large PDB structures
https://redmine.open-bio.org/issues/3403

Author: David Cain
Status: New
Priority: High
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: https://github.com/DavidCain/biopython/tree/fix_pdb_dl

The current @PDBList@ module will often fail to download large PDB files.

<pre>
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
Downloading PDB structure '1hgg'...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/Bio/PDB/PDBList.py", line 247, in retrieve_pdb_file
    out.writelines(gz.read())
  File "/usr/lib/python2.7/gzip.py", line 249, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_eof()
  File "/usr/lib/python2.7/gzip.py", line 342, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
</pre>

The source of this problem is that the entire gzipped file must be read into memory before it's written to disk locally. With large archives, the local file can be truncated prematurely, which causes gzip to crash on extraction.

I fixed this issue on my "GitHub branch":https://github.com/DavidCain/biopython/tree/fix_pdb_dl, which I've made a pull request for.

-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org