[Biopython-dev] Bug in Bio.PDB?

Morten Kjeldgaard mok at bioxray.dk
Tue Feb 18 01:02:44 UTC 2014


Hi,

I need to remove HETATMS and water molecules from a PDB file, so I was playing around with the following code snippet (from [1]):


for model in structure:
    for chain in model:
        for residue in chain:
            id = residue.id
            if id[0] != ' ':
                chain.detach_child(id)

Unfortunately, it does not work correctly. If 2 HETATM residues are found after each other, the second one is skipped. I assume the reason is that model, chain, etc. really are generators, and they get screwed up if you mess around with the datastructure while the loop is running. Here is a bit of debugging output from a run with print statements scattered appropriately in the above code:

*** D *** <Residue   C het=  resseq=13 icode= > (' ', 13, ' ')
*** D *** <Residue   A het=  resseq=14 icode= > (' ', 14, ' ')
*** D *** <Residue   G het=  resseq=15 icode= > (' ', 15, ' ')
*** D *** <Residue H2U het=H_H2U resseq=16 icode= > ('H_H2U', 16, ' ')
removing ('H_H2U', 16, ' ') from chain D
*** D *** <Residue   G het=  resseq=18 icode= > (' ', 18, ' ')
*** D *** <Residue   G het=  resseq=19 icode= > (' ', 19, ' ')
*** D *** <Residue   G het=  resseq=20 icode= > (' ', 20, ' ‘)

As you see, residue 16 is correctly identified as a HETATM residue, however, the following residue 17 is skipped (it is also a H2U residue) and so it is NOT removed from the structure.

The way to make the loop work is to squirrel away a list of HETATM residues and detach them from the chain when the loop is finished. (Another way is to keep running the snippet until no HETATMs are left.)

I am not sure whether to characterize this as a bug or a “feature”, but it is confusing and defeats the intuitive understanding of how the SMCRA hierarchy objects ought to work.

(I am using Biopython version 1.59)

Cheers,
Morten

[1] http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython


-- 
Morten Kjeldgaard, asc. professor, MSc, PhD
Dept. of Molecular Biology and Genetics, Aarhus University
Gustav Wieds Vej 10C, Building 3135, DK-8000 Aarhus C, Denmark.





More information about the Biopython-dev mailing list