[Biopython] Problem with pdb-file parsing

Christian Schäfer schafer at rostlab.org
Tue Sep 8 13:45:53 EDT 2009


I don't know whether this is either a bug or I did something wrong. I am
parsing the pdb structure 1a2d with the following code to get the
one-letter polypeptide sequence for chain A:

from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import *

parser = PDBParser()
ppb = PPBuilder()
structure = parser.get_structure('tmp', '1a2d.pdb')
polypeptide = ppb.build_peptides(structure[0]['A'])
sequence = str(polypeptide[0].get_sequence())

print sequence

This however gives me a sequence that is one aminoacid shorter than
expected. The structure contains one HETATM block within the ATOM block
of chain A (pos 117), which gets translated into a 'X' in the sequence.
The following aminoacid at position 118 (VAL) seems to be missing.

So the resulting sequence around the X is:
To my understanding this should be:

Is this behaviour intended? Is it a bug? The biopython version is 1.49
(Ubuntu jaunty).


More information about the Biopython mailing list