[Biopython] Bio.PDB local MMCIF files

Dave Howorth dhoworth at mrc-lmb.cam.ac.uk
Thu Dec 5 16:45:28 UTC 2013


João Rodrigues wrote:
> Dear Dave,
> 
> I'm not quite sure I understood your question. PDBList is used to download
> and maintain a local copy of the PDB, which would not suit you since you
> are looking for mmCIF data. It could be tweaked however to download mmCIF
> files. Is this what you are looking for?

Sorry, I didn't express myself very well. I misunderstood the purpose of
PDBList, and at the time thought it was simply a way to tell Biopython
where the local archive was. I already have access to a PDB/mmCIF
archive; I don't need to create one.

> As for mmCIF parsing and manipulation, currently the parser accepts a path
> to the file (relative paths should do) but indeed it does not handle
> compression. I think it would be up to the user to inflate the gz file
> before parsing..

I don't think that is very convenient, since all the files are normally
stored compressed. That's the usual case. Using a filename as the only
way to specify a file means that I would have to open the file in the
archive, read and uncompress it and store it in another file before
passing the name of that file to the mmCIF parser. Unless python
supports some means to incorporate a decopmression layer specification
into the 'filename'? (Sorry, I'm new to python)

I would think that the 'nicest' solution would be for the parser to
recognize a compressed file and use a gzip layer to decompress it on the
fly. Alternatively, the parser could accept an open file handle as an
alternative to a filename and the caller would be responsible for
opening the file through a decompression layer.

Since the caller is going to have to deal with prepending the library
base to the filename anyway, I suppose having it produce a decompressed
stream is not a great problem, if only it could pass the stream to the
parser!

Cheers, Dave

> Best,
> 
> João

PS, Sorry if I'm breaking threads by replying to the copy of the email
that João sent to me, but the copy from the mail server hasn't arrived
here yet, despite being visible at gmane.


More information about the Biopython mailing list