[BioPython] biopython and compressed PDB files

Robert Campbell rlc1 at post.queensu.ca
Tue Sep 21 11:11:56 EDT 2004


> Michael Sierk wrote:
> 
> >I was wondering if someone can point me to code that uncompresses PDB  
> >files in memory (assuming such a thing is possible)?  I found the 
> >gzip  python module, but apparently that only does .gz files?

On 2004-09-20 13:21 Iddo wrote:
> 
> Which version of Python are you using? Python 2.3 has the bz2 module 
> (http://docs.python.org/lib/module-bz2.html).
> 
> For zip files, there is the zipfile module:
> 
> http://docs.python.org/lib/module-zipfile.html
> 
> and the zlib:
> 
> http://docs.python.org/lib/module-zlib.html#l2h-2626
> 
> 
> That should cover it...

But actually it doesn't. :( 

The PDB (for whatever reason) unfortunately uses the UNIX compressed
format ".Z" for its files. I'm not sure why as gzip and bzip2 are much
better (give better compression) and are more universal. The UNIX
Compress program uses the formerly-patented LZW algorithm and so until
now there are no ready to use python modules for dealing with .Z files.
I haven't found any yet, in any case.

See:

  http://mail.python.org/pipermail/python-list/2004-May/220197.html

and 

  http://mail.python.org/pipermail/python-list/2004-May/222565.html

So while neither the gzip nor bzip2 python modules can handle a '.Z'
file, the system gunzip programs (on Linux) can. So what I'm forced to do
is either:

  os.system("gunzip %s" % compressed_filename) 

and then read the "filename" or

  file_as_string = os.popen("gunzip -c %s" % compressed_filename, "r").read()

If you are writing something to retrieve pdb files you can use the rcsb
web site, rather than the ftp site and specify the compression:

  filename = urllib.urlretrieve('http://www.rcsb.org/pdb/cgi/export.cgi/' +
           pdbCode + '.pdb.gz?format=PDB&pdbId=' + pdbCode + '&compression=gz')[0]
  file_as_string = gzip.GzipFile(filename,'r')

Unfortunately for me, I keep my own PDB mirror and "download" from there
for speed sake so I'm stuck (for now) "uncompressing".

Cheers,
Rob
-- 
Robert L. Campbell, Ph.D.                         <rlc1 at post.queensu.ca>
Senior Research Associate                            phone: 613-533-6821
Dept. of Biochemistry, Queen's University,             fax: 613-533-2497
Kingston, ON K7L 3N6  Canada       http://adelie.biochem.queensu.ca/~rlc
    PGP Fingerprint: 9B49 3D3F A489 05DC B35C  8E33 F238 A8F5 F635 C0E2


More information about the BioPython mailing list