[Biopython] Problem with pdb-file parsing

Peter biopython at maubp.freeserve.co.uk
Wed Sep 9 09:25:11 UTC 2009


On Tue, Sep 8, 2009 at 6:45 PM, Christian Schäfer<schafer at rostlab.org> wrote:
> Hi,
>
> I don't know whether this is either a bug or I did something wrong.
> I am parsing the pdb structure 1a2d with the following code to get
> the one-letter polypeptide sequence for chain A:
>
> ------------------CODE----------------
> from Bio.PDB.PDBParser import PDBParser
> from Bio.PDB.Polypeptide import *
>
> parser = PDBParser()
> ppb = PPBuilder()
> structure = parser.get_structure('tmp', '1a2d.pdb')
> polypeptide = ppb.build_peptides(structure[0]['A'])
> sequence = str(polypeptide[0].get_sequence())
>
> print sequence
> ------------------CODE----------------
>
> This however gives me a sequence that is one aminoacid shorter than
> expected. The structure contains one HETATM block within the ATOM
> block of chain A (pos 117), which gets translated into a 'X' in the
> sequence. The following aminoacid at position 118 (VAL) seems to be
> missing.
>
> So the resulting sequence around the X is:
> ...VEXMK...
> To my understanding this should be:
> ...VEXVMK...
>
> Is this behaviour intended? Is it a bug? The biopython version is 1.49
> (Ubuntu jaunty).

I agree that does not seem to be sensible. I get the same behaviour with
the latest code in the repository (so updating to Biopython 1.51 won't help
here). It looks like a bug in the builder code, since the parser seems fine,
and you can get the sequence in other ways, e.g.

from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import *

parser = PDBParser()
ppb = PPBuilder()
structure = parser.get_structure('tmp', '1a2d.pdb')
for model in structure :
   for chain in model :
       #Try adjusting depending on if you expect just the 20
       #standard amino acids etc.
       #aminos = [to_one_letter_code.get(res.resname,"X") \
       #          for res in chain if res.resname != "HOH"]
       aminos = [to_one_letter_code.get(res.resname,"X") \
                 for res in chain if "CA" in res.child_dict]
       sequence = "".join(aminos)
       print sequence

Could you file this as a bug on Bugzilla please?
http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

Thanks,

Peter




More information about the Biopython mailing list