[Biopython-dev] [Bug 2626] New: Bio.PDB mmCIFParser parse exceptions

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Oct 24 03:03:09 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2626

           Summary: Bio.PDB mmCIFParser parse exceptions
           Product: Biopython
           Version: 1.48
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Other
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cjoldfield at gmail.com


I recently ran the mmCIFParser object over all of PDB's mmCIF files and found a
large number of files failed to parse correctly (a short script at the end to
demonstrate).  Of ~50k mmCIF files, 3891 files failed to parse and another 1980
were missing fields in the mmCIF dictionary.  

A few examples of files that failed to parse: 
http://www.rcsb.org/pdb/files/1alw.cif.gz
http://www.rcsb.org/pdb/files/1det.cif.gz
http://www.rcsb.org/pdb/files/1tmy.cif.gz

A few with missing fields:
http://www.rcsb.org/pdb/files/1mfl.cif.gz
http://www.rcsb.org/pdb/files/1tfj.cif.gz
http://www.rcsb.org/pdb/files/1zn8.cif.gz

The problem seems to be that an error in one mmCIF table, like an extra field,
seems to propogate through the rest of the parse.

x86_64 gentoo linux 2008, src BioPython install

__CODE__
import sys
from Bio.PDB import *

if len(sys.argv) != 2:
    print "usage: mmCifParseCheck.py <structFile>"
    sys.exit(0)
structFile = sys.argv[1]

resultString = "";

#parse to structure object
numRes = 0
parser=MMCIFParser()
try:
    structure=parser.get_structure('test',structFile)
    for model in structure:
        for chain in model:
            for residue in chain:
                if(residue.id[0][:2] != "H_"):
                    numRes += 1
except:
    resultString += "parse to structure object failed\n";
else:
    resultString += "parse to structure object succeeded\n";

#parse whole mmCIF file to dict
try:
    mmcif_dict=MMCIF2Dict.MMCIF2Dict(structFile)
except:
    resultString += "parse to dict failed\n";
else:
    resultString += "parse to dict succeeded\n";

#get a required entry
try:
    id = mmcif_dict['_entry.id']
except:
    resultString += "key lookup failed\n";
else:
    resultString += "key lookup succeeded\n";

print resultString
print "number of non-het residues " + str(numRes)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list