[BioPython] PDBParser, chain iterator problem. Terminating just after TER record

Thu May 11 21:50:30 UTC 2006

Lee, Byung-chul wrote:
> Hi biopython users,
> 
> During paring the pdb files, I met a ridiculous parsing result.
> 
> I coded my program like below:
> 
> from Bio.PDB import *
> p = PDBParser()
> s= p.get_structure('test','pdb1j1t.ent')
> m0 = s[0]
> for r in m0['A']:
> print r
> 
> then the result was obtained like :
> ...
> <Residue GLU het= resseq=231 icode= >
> <Residue THR het= resseq=232 icode= >
> <Residue ASN het= resseq=233 icode= >
> <Residue CA het=H_ CA resseq=301 icode= >
> 
> But in the pdb file, the TER record was written before Residue CA like
> below:
> ...
> ATOM 1763 OD1 ASN A 233 -3.371 16.572 33.547 1.00 51.28 O
> ATOM 1764 ND2 ASN A 233 -3.068 17.873 31.741 1.00 49.95 N
> ATOM 1765 OXT ASN A 233 -6.247 19.343 31.607 1.00 51.21 O
> TER 1766 ASN A 233
> HETATM 1767 CA CA A 301 31.453 10.121 13.116 1.00 15.05 CA
> HETATM 1768 S SO4 302 21.891 21.921 14.715 1.00 50.50 S
> 
> Thus I think BioPython's model iterator must stop before <Residue CA
> het=H_ CA resseq=301 icode= >, and I want to know why this happens and
> how I can solve this.

I inferred from the filename used, that you are talking about the full 
PDB file for record 1j1t - certainly this seems to match the samples 
lines you quoted.

Are the HETATM 1767 and 1768 really part of the chain?  What about the 
rest of the S04 HETATM lines (1768 to 1772) and the following waters 
(HETATM 1773 to 2092).

I would guess that BioPython treats a termination record (TER) as the 
end of the chain, and on the face of it that is the correct action.

If you really want to treat these atoms as part of the chain you might 
try editing the PDB file to move the TER line a little lower down.

I'm sure Thomas Hamelry (author of the BioPython PDB parser) will be 
along in a while with a definitive answer...

Peter