[Biopython-dev] Bug in Bio.PDB?
Morten Kjeldgaard
mok at bioxray.dk
Tue Feb 18 01:02:44 UTC 2014
Hi,
I need to remove HETATMS and water molecules from a PDB file, so I was playing around with the following code snippet (from [1]):
for model in structure:
for chain in model:
for residue in chain:
id = residue.id
if id[0] != ' ':
chain.detach_child(id)
Unfortunately, it does not work correctly. If 2 HETATM residues are found after each other, the second one is skipped. I assume the reason is that model, chain, etc. really are generators, and they get screwed up if you mess around with the datastructure while the loop is running. Here is a bit of debugging output from a run with print statements scattered appropriately in the above code:
*** D *** <Residue C het= resseq=13 icode= > (' ', 13, ' ')
*** D *** <Residue A het= resseq=14 icode= > (' ', 14, ' ')
*** D *** <Residue G het= resseq=15 icode= > (' ', 15, ' ')
*** D *** <Residue H2U het=H_H2U resseq=16 icode= > ('H_H2U', 16, ' ')
removing ('H_H2U', 16, ' ') from chain D
*** D *** <Residue G het= resseq=18 icode= > (' ', 18, ' ')
*** D *** <Residue G het= resseq=19 icode= > (' ', 19, ' ')
*** D *** <Residue G het= resseq=20 icode= > (' ', 20, ' ‘)
As you see, residue 16 is correctly identified as a HETATM residue, however, the following residue 17 is skipped (it is also a H2U residue) and so it is NOT removed from the structure.
The way to make the loop work is to squirrel away a list of HETATM residues and detach them from the chain when the loop is finished. (Another way is to keep running the snippet until no HETATMs are left.)
I am not sure whether to characterize this as a bug or a “feature”, but it is confusing and defeats the intuitive understanding of how the SMCRA hierarchy objects ought to work.
(I am using Biopython version 1.59)
Cheers,
Morten
[1] http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython
--
Morten Kjeldgaard, asc. professor, MSc, PhD
Dept. of Molecular Biology and Genetics, Aarhus University
Gustav Wieds Vej 10C, Building 3135, DK-8000 Aarhus C, Denmark.
More information about the Biopython-dev
mailing list