[Biopython-dev] [Biopython (old issues only) - Bug #2626] (Resolved) Bio.PDB mmCIFParser parse exceptions
redmine at redmine.open-bio.org
redmine at redmine.open-bio.org
Sat Nov 12 20:08:02 UTC 2016
Issue #2626 has been updated by Lenna Peterson.
Description updated
Status changed from New to Resolved
% Done changed from 0 to 100
Still failing. Migrated to github:
https://github.com/biopython/biopython/issues/990
----------------------------------------
Bug #2626: Bio.PDB mmCIFParser parse exceptions
https://redmine.open-bio.org/issues/2626#change-15366
* Author: Chris Oldfield
* Status: Resolved
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Other
* Target version: 1.48
* URL:
----------------------------------------
I recently ran the mmCIFParser object over all of PDB's mmCIF files and found a large number of files failed to parse correctly (a short script at the end to demonstrate). Of ~50k mmCIF files, 3891 files failed to parse and another 1980 were missing fields in the mmCIF dictionary.
A few examples of files that failed to parse:
http://www.rcsb.org/pdb/files/1alw.cif.gz
http://www.rcsb.org/pdb/files/1det.cif.gz
http://www.rcsb.org/pdb/files/1tmy.cif.gz
A few with missing fields:
http://www.rcsb.org/pdb/files/1mfl.cif.gz
http://www.rcsb.org/pdb/files/1tfj.cif.gz
http://www.rcsb.org/pdb/files/1zn8.cif.gz
The problem seems to be that an error in one mmCIF table, like an extra field, seems to propogate through the rest of the parse.
x86_64 gentoo linux 2008, src BioPython install
__CODE__
import sys
from Bio.PDB import *
if len(sys.argv) != 2:
print "usage: mmCifParseCheck.py <structFile>"
sys.exit(0)
structFile = sys.argv[1]
resultString = "";
#parse to structure object
numRes = 0
parser=MMCIFParser()
try:
structure=parser.get_structure('test',structFile)
for model in structure:
for chain in model:
for residue in chain:
if(residue.id[0][:2] != "H_"):
numRes += 1
except:
resultString += "parse to structure object failed\n";
else:
resultString += "parse to structure object succeeded\n";
#parse whole mmCIF file to dict
try:
mmcif_dict=MMCIF2Dict.MMCIF2Dict(structFile)
except:
resultString += "parse to dict failed\n";
else:
resultString += "parse to dict succeeded\n";
#get a required entry
try:
id = mmcif_dict['_entry.id']
except:
resultString += "key lookup failed\n";
else:
resultString += "key lookup succeeded\n";
print resultString
print "number of non-het residues " + str(numRes)
---Files--------------------------------
mmCifParseCheck.py (1021 Bytes)
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20161112/c3065776/attachment.html>
More information about the Biopython-dev
mailing list