[Biopython-dev] [Bug 3096] New: PPBuilder build_peptides bugs
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Jun 8 18:52:28 EDT 2010
http://bugzilla.open-bio.org/show_bug.cgi?id=3096
Summary: PPBuilder build_peptides bugs
Product: Biopython
Version: Not Applicable
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: skong at zymeworks.com
Given a chain of backbone connected residues 'IXRGXTGL' that contains two
non-standard amino acids 'X' in between, building peptide with only standard
amino acid builder should return two peptides 'RG' and 'TGL'. 'I' should not be
returned as a peptide since it is just one residue. Currently biopython would
return 'IXGXGL', with two bugs in between:
1. Skipping a standard amino acid R and T after each X, while keeping X (Should
skip X instead not R or T). Related to
http://bugzilla.open-bio.org/show_bug.cgi?id=2910 and
http://lists.open-bio.org/pipermail/biopython/2009-September/005532.html
2. Return one peptide even though after filtering the two X residues which
connect 'I', 'RG', 'TGL' are no longer present and fragment 'IRGTGL' cannot be
considered as a valid peptide without the two Xs connecting them.
The above sequence 'IXRGXTGL' are taken from 1bfe and mutated. The 'mutation'
referred here is simply renaming the residue name to something that is not
standard and represented as 'X'.
Each solution proposed below is meant to fix respective bug above:
1. Insert (not accept(prev) or not accept(next)) after if aa_only check at line
299 of Bio/PDB/Polypeptide.py
2. Insert pp=None when either of the residues compared are filtered at line 300
or Bio/PDB/Polypeptide.py
Amino acids filtering bug in method build_peptides() of class _PPBuilder ofin
Bio/PDB/Polypeptide.py:
Original:
for chain in chain_list:
chain_it=iter(chain)
prev=chain_it.next()
pp=None
for next in chain_it:
if aa_only and not accept(prev):
prev=next
continue
if is_connected(prev, next):
if pp is None:
pp=Polypeptide()
pp.append(prev)
pp_list.append(pp)
pp.append(next)
else:
pp=None
prev=next
return pp_list
Fixed:
for chain in chain_list:
chain_it=iter(chain)
prev=chain_it.next()
pp=None
for next in chain_it:
if aa_only and (not accept(prev) or not accept(next)):
prev=next; pp=None
continue
if is_connected(prev, next):
if pp is None:
pp=Polypeptide()
pp.append(prev)
pp_list.append(pp)
pp.append(next)
else:
pp=None
prev=next
return pp_list
Attached here is the code used to test the above case, with and without
mutations, and with and without standard amino acid filtering. The case without
mutation is just to show that the backbone atoms of the mutated version are
connected:
from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder, is_aa
class StandardAABuilder(PPBuilder):
""" Polypeptide builder which accepts only standard amino acids."""
def _accept(self, residue):
return is_aa(residue, standard=True)
def extract_peptides(model):
"""Extracts the peptides from a model.
Returns a list of Peptide object."""
output = []
for peptide in PPBuilder().build_peptides(model):
seq = str(peptide.get_sequence())
output.append(seq)
return output
def extract_peptides_saa(model):
"""Extracts the peptides from a model.
Returns a list of Peptide object."""
output = []
for peptide in StandardAABuilder().build_peptides(model):
seq = str(peptide.get_sequence())
output.append(seq)
return output
if __name__ == '__main__':
oripdb = open('chopped_pdb1bfe.ent')
sto = PDBParser().get_structure('', oripdb)
seqao = extract_peptides(sto)
seqbo = extract_peptides_saa(sto)
print 'ori seq all '
print seqao
print 'ori seq standard only'
print seqbo
pdb = open('chopped_mutated_pdb1bfe.ent')
st = PDBParser().get_structure('', pdb)
seqa = extract_peptides(st)
seqb = extract_peptides_saa(st)
print 'mut seq all'
print seqa
print 'mut seq standard only '
print seqb
Attached below are the two fragments of PDB files, pre and post mutated.
chopped_pdb1bfe.ent
ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N
ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C
ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C
ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O
ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C
ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C
ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C
ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C
ATOM 93 N HIS A 317 38.425 72.679 27.969 1.00 35.61 N
ATOM 94 CA HIS A 317 37.880 73.473 26.881 1.00 37.92 C
ATOM 95 C HIS A 317 38.360 72.928 25.540 1.00 37.79 C
ATOM 96 O HIS A 317 39.463 73.240 25.094 1.00 37.44 O
ATOM 97 CB HIS A 317 38.303 74.930 27.052 1.00 35.19 C
ATOM 98 CG HIS A 317 37.888 75.519 28.363 1.00 35.76 C
ATOM 99 ND1 HIS A 317 36.611 75.981 28.602 1.00 37.74 N
ATOM 100 CD2 HIS A 317 38.575 75.701 29.516 1.00 37.59 C
ATOM 101 CE1 HIS A 317 36.529 76.420 29.844 1.00 38.74 C
ATOM 102 NE2 HIS A 317 37.706 76.262 30.421 1.00 36.76 N
ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N
ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C
ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C
ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O
ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C
ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C
ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C
ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N
ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C
ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N
ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N
ATOM 114 N GLY A 319 37.690 73.604 22.461 1.00 49.96 N
ATOM 115 CA GLY A 319 38.138 74.668 21.592 1.00 55.53 C
ATOM 116 C GLY A 319 38.459 74.219 20.180 1.00 58.85 C
ATOM 117 O GLY A 319 37.583 73.766 19.440 1.00 58.98 O
ATOM 118 N SER A 320 39.734 74.334 19.823 1.00 61.64 N
ATOM 119 CA SER A 320 40.219 73.992 18.493 1.00 63.16 C
ATOM 120 C SER A 320 40.212 72.517 18.110 1.00 65.27 C
ATOM 121 O SER A 320 39.558 72.127 17.145 1.00 65.12 O
ATOM 122 CB SER A 320 41.634 74.542 18.316 1.00 65.36 C
ATOM 123 OG SER A 320 42.124 74.255 17.019 1.00 72.05 O
ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N
ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C
ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C
ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O
ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C
ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O
ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C
ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N
ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C
ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C
ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O
ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N
ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C
ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C
ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O
ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C
ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C
ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C
ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C
PDB FILE: mutated_chopped_pdb1bfe.ent
ATOM 85 N ILE A 316 37.386 71.217 31.070 1.00 36.97 N
ATOM 86 CA ILE A 316 38.311 71.290 29.949 1.00 33.71 C
ATOM 87 C ILE A 316 37.634 72.103 28.862 1.00 33.93 C
ATOM 88 O ILE A 316 36.415 72.216 28.839 1.00 36.46 O
ATOM 89 CB ILE A 316 38.651 69.876 29.404 1.00 35.79 C
ATOM 90 CG1 ILE A 316 39.331 69.049 30.501 1.00 36.78 C
ATOM 91 CG2 ILE A 316 39.572 69.979 28.187 1.00 37.71 C
ATOM 92 CD1 ILE A 316 39.881 67.724 30.023 1.00 39.20 C
ATOM 93 N HIE A 317 38.425 72.679 27.969 1.00 35.61 N
ATOM 94 CA HIE A 317 37.880 73.473 26.881 1.00 37.92 C
ATOM 95 C HIE A 317 38.360 72.928 25.540 1.00 37.79 C
ATOM 96 O HIE A 317 39.463 73.240 25.094 1.00 37.44 O
ATOM 97 CB HIE A 317 38.303 74.930 27.052 1.00 35.19 C
ATOM 98 CG HIE A 317 37.888 75.519 28.363 1.00 35.76 C
ATOM 99 ND1 HIE A 317 36.611 75.981 28.602 1.00 37.74 N
ATOM 100 CD2 HIE A 317 38.575 75.701 29.516 1.00 37.59 C
ATOM 101 CE1 HIE A 317 36.529 76.420 29.844 1.00 38.74 C
ATOM 102 NE2 HIE A 317 37.706 76.262 30.421 1.00 36.76 N
ATOM 103 N ARG A 318 37.527 72.109 24.905 1.00 38.78 N
ATOM 104 CA ARG A 318 37.884 71.512 23.627 1.00 42.04 C
ATOM 105 C ARG A 318 38.469 72.559 22.699 1.00 45.14 C
ATOM 106 O ARG A 318 39.592 72.425 22.205 1.00 42.05 O
ATOM 107 CB ARG A 318 36.657 70.880 22.967 1.00 42.93 C
ATOM 108 CG ARG A 318 36.934 70.321 21.576 1.00 38.60 C
ATOM 109 CD ARG A 318 35.654 70.038 20.821 1.00 35.39 C
ATOM 110 NE ARG A 318 34.624 69.538 21.724 1.00 34.96 N
ATOM 111 CZ ARG A 318 34.539 68.278 22.141 1.00 31.51 C
ATOM 112 NH1 ARG A 318 35.419 67.373 21.736 1.00 25.19 N
ATOM 113 NH2 ARG A 318 33.579 67.929 22.983 1.00 29.10 N
ATOM 114 N GLY A 319 37.690 73.604 22.461 1.00 49.96 N
ATOM 115 CA GLY A 319 38.138 74.668 21.592 1.00 55.53 C
ATOM 116 C GLY A 319 38.459 74.219 20.180 1.00 58.85 C
ATOM 117 O GLY A 319 37.583 73.766 19.440 1.00 58.98 O
ATOM 118 N XQQ A 320 39.734 74.334 19.823 1.00 61.64 N
ATOM 119 CA XQQ A 320 40.219 73.992 18.493 1.00 63.16 C
ATOM 120 C XQQ A 320 40.212 72.517 18.110 1.00 65.27 C
ATOM 121 O XQQ A 320 39.558 72.127 17.145 1.00 65.12 O
ATOM 122 CB XQQ A 320 41.634 74.542 18.316 1.00 65.36 C
ATOM 123 OG XQQ A 320 42.124 74.255 17.019 1.00 72.05 O
ATOM 124 N THR A 321 40.955 71.702 18.853 1.00 67.43 N
ATOM 125 CA THR A 321 41.049 70.274 18.562 1.00 67.73 C
ATOM 126 C THR A 321 40.220 69.430 19.529 1.00 66.41 C
ATOM 127 O THR A 321 39.244 69.917 20.095 1.00 70.21 O
ATOM 128 CB THR A 321 42.517 69.810 18.620 1.00 70.22 C
ATOM 129 OG1 THR A 321 42.613 68.453 18.169 1.00 77.03 O
ATOM 130 CG2 THR A 321 43.049 69.915 20.045 1.00 72.07 C
ATOM 131 N GLY A 322 40.608 68.168 19.707 1.00 61.22 N
ATOM 132 CA GLY A 322 39.892 67.286 20.614 1.00 53.23 C
ATOM 133 C GLY A 322 40.037 67.705 22.065 1.00 48.00 C
ATOM 134 O GLY A 322 40.138 68.892 22.372 1.00 50.41 O
ATOM 135 N LEU A 323 40.044 66.734 22.968 1.00 41.92 N
ATOM 136 CA LEU A 323 40.190 67.033 24.385 1.00 35.58 C
ATOM 137 C LEU A 323 41.613 66.738 24.874 1.00 31.41 C
ATOM 138 O LEU A 323 41.932 66.921 26.046 1.00 30.47 O
ATOM 139 CB LEU A 323 39.160 66.240 25.191 1.00 35.76 C
ATOM 140 CG LEU A 323 37.716 66.576 24.802 1.00 39.50 C
ATOM 141 CD1 LEU A 323 36.733 65.796 25.670 1.00 38.15 C
ATOM 142 CD2 LEU A 323 37.493 68.074 24.955 1.00 38.58 C
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list