[Biopython-dev] [Bug 2910] Bio.PDB build_peptides sometimes gives shorter peptide sequences than expected

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Sep 10 08:55:03 EDT 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2910


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|critical                    |normal
            Summary|Parsing some pdb files      |Bio.PDB build_peptides
                   |results in shorter peptide  |sometimes gives shorter
                   |sequences than expected     |peptide sequences than
                   |                            |expected




------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-09-10 08:55 EST -------
Retitled as this appears to be a bug in the PPBuilder build_peptides method,
not the PDB parser, see:
http://lists.open-bio.org/pipermail/biopython/2009-September/005532.html

Test script:

from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder, to_one_letter_code
parser = PDBParser()
ppb = PPBuilder()
#structure = parser.get_structure('tmp', '1A2D.pdb')
structure = parser.get_structure('tmp', '13GS.pdb')
for model in structure :
    polypeptides = ppb.build_peptides(model)
    assert len(model) == len(polypeptides)
    for chain, pep in zip(model, polypeptides) :
        print
        print "Chain", chain.id
        print "Raw chain:"
        print "".join(to_one_letter_code.get(res.resname,"X") \
                      for res in chain if "CA" in res.child_dict)
        print "From peptide builder:"
        print pep.get_sequence()

Output for 1A2D,

PDBConstructionWarning: WARNING: Chain A is discontinuous at line 2426.
PDBConstructionWarning: WARNING: Chain B is discontinuous at line 2427.
PDBConstructionWarning: WARNING: Chain A is discontinuous at line 2428.
PDBConstructionWarning: WARNING: Chain B is discontinuous at line 2448.

Chain A
Raw chain:
CDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDLVTIRSESTFKNTEISFKLGVEFDEITADDRKVKSIITLDGGALVQVQKWDGKSTTIKRKRDGDKLVVEXVMKGVTSTRVYERA
>From peptide builder:
CDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDLVTIRSESTFKNTEISFKLGVEFDEITADDRKVKSIITLDGGALVQVQKWDGKSTTIKRKRDGDKLVVEXMKGVTSTRVYERA

Chain B
Raw chain:
CDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDLVTIRSESTFKNTEISFKLGVEFDEITADDRKVKSIITLDGGALVQVQKWDGKSTTIKRKRDGDKLVVEXVMKGVTSTRVYERA
>From peptide builder:
CDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDLVTIRSESTFKNTEISFKLGVEFDEITADDRKVKSIITLDGGALVQVQKWDGKSTTIKRKRDGDKLVVEXMKGVTSTRVYERA

Notice there are discontinuities in both chains A and B, and a missing residue
in their peptides.

And the output from 13GS,

PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3760.
PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3812.
PDBConstructionWarning: WARNING: Chain A is discontinuous at line 3852.
PDBConstructionWarning: WARNING: Chain B is discontinuous at line 3948.
PDBConstructionWarning: WARNING: Chain C is discontinuous at line 4033.

Chain A
Raw chain:
MPPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILRHLGRTLGLYGKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISFADYNLLDLLLIHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ
>From peptide builder:
MPPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILRHLGRTLGLYGKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISFADYNLLDLLLIHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ

Chain B
Raw chain:
PYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILRHLGRTLGLYGKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISFADYNLLDLLLIHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ
>From peptide builder:
PYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILRHLGRTLGLYGKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISFADYNLLDLLLIHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ

Chain C
Raw chain:
ECG
>From peptide builder:
CG

Chain D
Raw chain:
ECG
>From peptide builder:
CG

Notice there are discontinuities in chains A, B and C, but missing residues in
the peptide chains C and D. This suggests the discontinuities are required to
trigger the problem. Also there are no HETATM residues for chains C and D.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list