[Biopython-dev] [Biopython (old issues only) - Bug #2626] (Resolved) Bio.PDB mmCIFParser parse exceptions

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Sat Nov 12 20:08:02 UTC 2016


Issue #2626 has been updated by Lenna Peterson.

Description updated
Status changed from New to Resolved
% Done changed from 0 to 100

Still failing. Migrated to github:

https://github.com/biopython/biopython/issues/990

----------------------------------------
Bug #2626: Bio.PDB mmCIFParser parse exceptions
https://redmine.open-bio.org/issues/2626#change-15366

* Author: Chris Oldfield
* Status: Resolved
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Other
* Target version: 1.48
* URL: 
----------------------------------------
I recently ran the mmCIFParser object over all of PDB's mmCIF files and found a large number of files failed to parse correctly (a short script at the end to demonstrate).  Of ~50k mmCIF files, 3891 files failed to parse and another 1980 were missing fields in the mmCIF dictionary.  

A few examples of files that failed to parse: 
http://www.rcsb.org/pdb/files/1alw.cif.gz
http://www.rcsb.org/pdb/files/1det.cif.gz
http://www.rcsb.org/pdb/files/1tmy.cif.gz

A few with missing fields:
http://www.rcsb.org/pdb/files/1mfl.cif.gz
http://www.rcsb.org/pdb/files/1tfj.cif.gz
http://www.rcsb.org/pdb/files/1zn8.cif.gz

The problem seems to be that an error in one mmCIF table, like an extra field, seems to propogate through the rest of the parse.

x86_64 gentoo linux 2008, src BioPython install

__CODE__
import sys
from Bio.PDB import *

if len(sys.argv) != 2:
    print "usage: mmCifParseCheck.py <structFile>"
    sys.exit(0)
structFile = sys.argv[1]

resultString = "";

#parse to structure object
numRes = 0
parser=MMCIFParser()
try:
    structure=parser.get_structure('test',structFile)
    for model in structure:
        for chain in model:
            for residue in chain:
                if(residue.id[0][:2] != "H_"):
                    numRes += 1
except:
    resultString += "parse to structure object failed\n";
else:
    resultString += "parse to structure object succeeded\n";

#parse whole mmCIF file to dict
try:
    mmcif_dict=MMCIF2Dict.MMCIF2Dict(structFile)
except:
    resultString += "parse to dict failed\n";
else:
    resultString += "parse to dict succeeded\n";

#get a required entry
try:
    id = mmcif_dict['_entry.id']
except:
    resultString += "key lookup failed\n";
else:
    resultString += "key lookup succeeded\n";

print resultString
print "number of non-het residues " + str(numRes)

---Files--------------------------------
mmCifParseCheck.py (1021 Bytes)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20161112/c3065776/attachment.html>


More information about the Biopython-dev mailing list