[Biopython-dev] Questions on StructureBuilder, MMCIFParser, and MMCIFlex
Paul B
tallpaulinjax at yahoo.com
Sun Nov 1 19:50:31 UTC 2009
Hi,
I'm a computer science guy trying to figure out some chemistry logic to support my thesis, so bear with me! :-) To sum it up, I'm not sure MMCIFParser is handling ATOM and MODEL records correctly because of this code in MMCIFParser:
if fieldname=="HETATM":
hetatm_flag="H"
else:
hetatm_flag=" "
This causes ATOM (and potentially MODEL) records to die as seen in the exception below (I think!)
My questions are:
1. Am I correct the correct code is insufficient?
2. What additional logic beyond just recognizing whether it's a HETATM, ATOM or MODEL record needs to be added?
Thanks!
Paul
Background:
I understand MMCIFlex.py et cetera is commented out in the Windows setup.py package due to difficulties compiling it. So I re-wrote MMCIFlex strictly in Python to emulate what I THINK the original MMCIFlex did. My version processes a .cif file one line at a time (readline()) then passes tokens back to MMCIF2Dict at each call to get_token(). That seems to work fine for unit testing of my MMCIFlex and MMCIFDict which I had to slightly re-write (to ensure it handled how I passed SEMICOLONS line back etc).
However when I try and use this with MMCIFParser against the 2beg .cif file which has no HETATM records and, as I understand the definition, no disordered atoms I get:
C:\Python25\Lib\site-packages\Bio\PDB\StructureBuilder.py:85: PDBConstructionWar
ning: WARNING: Chain A is discontinuous at line 0.
PDBConstructionWarning)
C:\Python25\Lib\site-packages\Bio\PDB\StructureBuilder.py:122: PDBConstructionWa
rning: WARNING: Residue (' ', 17, ' ') redefined at line 0.
PDBConstructionWarning)
Traceback (most recent call last):
File "MMCIFParser.py", line 140, in <module>
structure=p.get_structure("test", filename)
File "MMCIFParser.py", line 23, in get_structure
self._build_structure(structure_id)
File "MMCIFParser.py", line 88, in _build_structure
icode)
File "C:\Python25\lib\site-packages\Bio\PDB\StructureBuilder.py", line 148, in
init_residue
% (resname, field, resseq, icode))
PDBExceptions.PDBConstructionException: Blank altlocs in duplicate residue LEU (
' ', 17, ' ')
Basically what I think MIGHT be happening is MMCIFParser is currently only handling HETATM records, when some other kind of record comes in (ATOM, MODEL) it is treated incorrectly. See below.
MMCIFParser.py
def _build_structure(self, structure_id):
....
fieldname_list=mmcif_dict["_atom_site.group_PDB"]
....
for i in xrange(0, len(atom_id_list)):
...
altloc=alt_list[i]
if altloc==".":
altloc=" "
...
fieldname=fieldname_list[i]
#How are ATOM and MODEL records handled?
if fieldname=="HETATM":
hetatm_flag="H"
else:
hetatm_flag=" "
if current_chain_id!=chainid:
current_chain_id=chainid
structure_builder.init_chain(current_chain_id)
current_residue_id=resseq
icode, int_resseq=self._get_icode(resseq)
#This is line 87-88 in the real file
structure_builder.init_residue(resname, hetatm_flag, int_resseq,
icode)
Class StructureBuilder:
...
def init_residue(self, resname, field, resseq, icode):
if field!=" ":
if field=="H":
# The hetero field consists of H_ + the residue name (e.g. H_FUC)
field="H_"+resname
res_id=(field, resseq, icode)
...
#This line will get executed for any non-HETATM record (ie ATOM Or MODEL)
#because in MMCIFParser, if it wasn't a HETATM, then it's a ' ' ???
if field==" ":
if self.chain.has_id(res_id):
=======>But there are no point mutations in 2beg that I know of. Shouldn't be here!
# There already is a residue with the id (field, resseq, icode).
# This only makes sense in the case of a point mutation.
if __debug__:
warnings.warn("WARNING: Residue ('%s', %i, '%s') "
"redefined at line %i."
% (field, resseq, icode, self.line_counter),
PDBConstructionWarning)
duplicate_residue=self.chain[res_id]
if duplicate_residue.is_disordered()==2:
# The residue in the chain is a DisorderedResidue object.
# So just add the last Residue object.
if duplicate_residue.disordered_has_id(resname):
# The residue was already made
self.residue=duplicate_residue
duplicate_residue.disordered_select(resname)
else:
# Make a new residue and add it to the already
# present DisorderedResidue
new_residue=Residue(res_id, resname, self.segid)
duplicate_residue.disordered_add(new_residue)
self.residue=duplicate_residue
return
else:
# Make a new DisorderedResidue object and put all
# the Residue objects with the id (field, resseq, icode) in it.
# These residues each should have non-blank altlocs for all their atoms.
# If not, the PDB file probably contains an error.
====> #This is the line throwing the exception, but we shouldn't be here!
if not self._is_completely_disordered(duplicate_residue):
# if this exception is ignored, a residue will be missing
self.residue=None
raise PDBConstructionException(\
"Blank altlocs in duplicate residue %s ('%s', %i, '%s')" \
% (resname, field, resseq, icode))
self.chain.detach_child(res_id)
new_residue=Residue(res_id, resname, self.segid)
disordered_residue=DisorderedResidue(res_id)
self.chain.add(disordered_residue)
disordered_residue.disordered_add(duplicate_residue)
disordered_residue.disordered_add(new_residue)
self.residue=disordered_residue
return
residue=Residue(res_id, resname, self.segid)
self.chain.add(residue)
self.residue=residue
More information about the Biopython-dev
mailing list