[BioPython] PDBParser, chain iterator problem. Terminating just after TER record

Fri May 12 05:17:48 UTC 2006

Thank Boris and Peter for your sincere efforts and kind reply.

My point was that I wanted to parse only ATOM/HETATM records before TER 
record, so in the case of mine, I wanted regions from SER (resseq=6) to 
ASN (resseq=233).
While I was looking for what to make problem in the pdb1j1t.ent after 
receiving your mails, it was founded that a chain id 'A'  was included 
in the HETATM 1767 record.
After deleting the chain id 'A'  of the 22th column in the HETATM 1767,  
my expected result was properly obtained.

So I think PDBParser seems not to recognize 'TER' record but to 
recognize only chain id column due to many problems of pdb ent file format.
I will contact Thomas Hamelryck, and thanks again.

Regards,

Byung-chul

Peter (BioPython)

> Lee, Byung-chul wrote:
>
>> Hi biopython users,
>>
>> During paring the pdb files, I met a ridiculous parsing result.
>>
>> I coded my program like below:
>>
>> from Bio.PDB import *
>> p = PDBParser()
>> s= p.get_structure('test','pdb1j1t.ent')
>> m0 = s[0]
>> for r in m0['A']:
>> print r
>>
>> then the result was obtained like :
>> ...
>> <Residue GLU het= resseq=231 icode= >
>> <Residue THR het= resseq=232 icode= >
>> <Residue ASN het= resseq=233 icode= >
>> <Residue CA het=H_ CA resseq=301 icode= >
>>
>> But in the pdb file, the TER record was written before Residue CA like
>> below:
>> ...
>> ATOM 1763 OD1 ASN A 233 -3.371 16.572 33.547 1.00 51.28 O
>> ATOM 1764 ND2 ASN A 233 -3.068 17.873 31.741 1.00 49.95 N
>> ATOM 1765 OXT ASN A 233 -6.247 19.343 31.607 1.00 51.21 O
>> TER 1766 ASN A 233
>> HETATM 1767 CA CA A 301 31.453 10.121 13.116 1.00 15.05 CA
>> HETATM 1768 S SO4 302 21.891 21.921 14.715 1.00 50.50 S
>>
>> Thus I think BioPython's model iterator must stop before <Residue CA
>> het=H_ CA resseq=301 icode= >, and I want to know why this happens and
>> how I can solve this.
>
>
> I inferred from the filename used, that you are talking about the full 
> PDB file for record 1j1t - certainly this seems to match the samples 
> lines you quoted.
>
> Are the HETATM 1767 and 1768 really part of the chain?  What about the 
> rest of the S04 HETATM lines (1768 to 1772) and the following waters 
> (HETATM 1773 to 2092).
>
> I would guess that BioPython treats a termination record (TER) as the 
> end of the chain, and on the face of it that is the correct action.
>
> If you really want to treat these atoms as part of the chain you might 
> try editing the PDB file to move the TER line a little lower down.
>
> I'm sure Thomas Hamelry (author of the BioPython PDB parser) will be 
> along in a while with a definitive answer...
>
> Peter
>
>

-- 
--------------------------------------------------------
The important thing is not to stop questioning.
                               : Albert Einstein

Byung chul Lee 
                at Detp. BioSystems KAIST, Korea
                                  Ph.D candidate
                                  82-42-869-4357
--------------------------------------------------------