[Biopython] (Bio.PDB) problems with NeighborSearch: error at levels above "A", residue index discrepancy with unfold_entities
James Jensen
jdjensen at eng.ucsd.edu
Thu Aug 29 23:04:41 UTC 2013
Hello!
I am writing a function that, given two chains in a PDB file, should
return 1) the positions and identities of all residues that are in
contact with (distance < 5 angstroms) a residue on the other chain, and
2) the amino acid sequences of the chains. I've been doing this with
NeighborSearch.search_all(radius=5, level='A') and then for each atom
pair, seeing what its parent residue is and whether the parent residues
of the two atoms belong to different chains. This may seem like a
roundabout way of doing it, but if I call search_all(radius=5,
level='R'), or indeed with level=any level other than 'A', I get the error
TypeError: unorderable types: Residue() < Residue()
So my first question is why it might be that search_all isn't working at
higher levels.
For the adjacent residue pairs I identify using NeighborSearch, I get
each residue's position in its respective chain by residue.get_id()[1].
I've noticed, however, that if I get the sequence of the chain using seq
= Selection.unfold_entities(chain, 'R') and then reference (i.e.
seq[index]) the amino acids using the indices returned by the
NeighborSearch step, they are not the same residues that I get if during
the NeighborSearch step I report residue.get_resname() for each adjacent
residue.
I've tried it with several proteins, and the problem is the same. Chains
A and C of 2h62 are an example.
I then noticed that the lowest residue ID number of the residues yielded
from Selection.unfold_entities(chain, 'R') is not 1. For chain A, it's
11, and for chain C, it's 34. Not knowing why this was, I thought I'd
try subtracting the lowest ID number from the indices returned by the
NeighborSearch step (i.e. in chain A, 11 -> 0 so seq[0] would be the
first residue, the one with ID 11). This happened to seem to work for
chain A. However, it gives me negative indices for some of the contacts
in chain C. This means that NeighborSearch can return residues that are
not returned by unfold_entities(). The lowest residue ID returned by
NeighborSearch for chain C was 24, whereas for unfold_entities() it was 34.
For both chains A and C, I was given the warning
PDBConstructionWarning: WARNING: Chain [letter] is
discontinuous at line [line number].
In fact, I seem to get this warning for just about every chain of every
structure I load. Is this the reason that the first residues in the two
chains are at 11 and 34, rather than 1? If so, could it be that
NeighborSearch is able to work around the discontinuity while
unfold_entities is not?
Any suggestions?
Thanks for your time and help,
James Jensen
More information about the Biopython
mailing list