[Biopython-dev] [Biopython - Bug #2626] Bio.PDB mmCIFParser parse exceptions

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Sat Apr 21 18:05:01 UTC 2012


Issue #2626 has been updated by Lenna Peterson.

File mmCifParseCheck.py added

I've attempted to rescue this code from overzealous "text formatting".

Attached version appeared to work on one test file; haven't tested the example broken files yet. 
----------------------------------------
Bug #2626: Bio.PDB mmCIFParser parse exceptions
https://redmine.open-bio.org/issues/2626

Author: Chris Oldfield
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Other
Target version: 1.48
URL: 


I recently ran the mmCIFParser object over all of PDB's mmCIF files and found a large number of files failed to parse correctly (a short script at the end to demonstrate).  Of ~50k mmCIF files, 3891 files failed to parse and another 1980 were missing fields in the mmCIF dictionary.  

A few examples of files that failed to parse: 
http://www.rcsb.org/pdb/files/1alw.cif.gz
http://www.rcsb.org/pdb/files/1det.cif.gz
http://www.rcsb.org/pdb/files/1tmy.cif.gz

A few with missing fields:
http://www.rcsb.org/pdb/files/1mfl.cif.gz
http://www.rcsb.org/pdb/files/1tfj.cif.gz
http://www.rcsb.org/pdb/files/1zn8.cif.gz

The problem seems to be that an error in one mmCIF table, like an extra field, seems to propogate through the rest of the parse.

x86_64 gentoo linux 2008, src BioPython install

__CODE__
import sys
from Bio.PDB import *

if len(sys.argv) != 2:
    print "usage: mmCifParseCheck.py <structFile>"
    sys.exit(0)
structFile = sys.argv[1]

resultString = "";

#parse to structure object
numRes = 0
parser=MMCIFParser()
try:
    structure=parser.get_structure('test',structFile)
    for model in structure:
        for chain in model:
            for residue in chain:
                if(residue.id[0][:2] != "H_"):
                    numRes += 1
except:
    resultString += "parse to structure object failed\n";
else:
    resultString += "parse to structure object succeeded\n";

#parse whole mmCIF file to dict
try:
    mmcif_dict=MMCIF2Dict.MMCIF2Dict(structFile)
except:
    resultString += "parse to dict failed\n";
else:
    resultString += "parse to dict succeeded\n";

#get a required entry
try:
    id = mmcif_dict['_entry.id']
except:
    resultString += "key lookup failed\n";
else:
    resultString += "key lookup succeeded\n";

print resultString
print "number of non-het residues " + str(numRes)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list