[Biopython-dev] Fwd: [biopython] Fix broken downloading of large PDB structures (#146)
Peter Cock
p.j.a.cock at googlemail.com
Wed Jan 9 23:55:13 UTC 2013
FYI
---------- Forwarded message ----------
From: David Cain <notifications at github.com>
Date: Wed, Jan 9, 2013 at 10:59 PM
Subject: [biopython] Fix broken downloading of large PDB structures (#146)
To: biopython/biopython <biopython at noreply.github.com>
Summary of changes
- Fix failure to download large PDB files
- Use with statements for safer file I/O
- Remove obsolete parameters
- PEP 8 changes, update documentation
Failure to download large PDB files
(See: Redmine bug #3403 <https://redmine.open-bio.org/issues/3403>)
The current PDBList module will often fail to download large PDB files.
>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
...
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>
The source of this problem is that the entire gzipped file must be read
into memory before it's written to disk locally.
Instead of this memory-intensive approach, I changed the downloading to
use urllib.urlretrieve, which is more readable and far more efficient.
Obsolete parameters
The long-obsolete parameters to retrieve_pdb_file(() have been
removed. Formerly, the function allowed the user to specify compression
and/or a system utility to perform decompression. But all archives are
now gzipped, and PDBList uses Python's gzip module to decompress
archives. These parameters have been obsolete for over a year (they were
marked deprecated with commit
7ebf6e9<https://github.com/biopython/biopython/commit/7ebf6e9ecb>
).
------------------------------
You can merge this Pull Request by running
git pull https://github.com/DavidCain/biopython fix_pdb_dl
Or view, comment on, or merge it at:
https://github.com/biopython/biopython/pull/146
Commit Summary
- Use urlretrieve to smartly download PDB archives
- Use 'with' statement for safer file I/O
- Collapse unwieldy if-else structure
- PEP8 fixes within retrieve_pdb_file
- Remove deprecated parameters
- Update with clarifying comments
- PEP8 fixes, updated comments for file
- Use urlretrieve in other instance of save to disk
File Changes
- *M* Bio/PDB/PDBList.py (217)
Patch Links:
- https://github.com/biopython/biopython/pull/146.patch
- https://github.com/biopython/biopython/pull/146.diff
More information about the Biopython-dev
mailing list