[Biopython] (Bio.PDB) problems with NeighborSearch: error at levels above "A", residue index discrepancy with unfold_entities

James Jensen jdjensen at eng.ucsd.edu
Thu Aug 29 23:04:41 UTC 2013


Hello!

I am writing a function that, given two chains in a PDB file, should 
return 1) the positions and identities of all residues that are in 
contact with (distance < 5 angstroms) a residue on the other chain, and 
2) the amino acid sequences of the chains. I've been doing this with 
NeighborSearch.search_all(radius=5, level='A') and then for each atom 
pair, seeing what its parent residue is and whether the parent residues 
of the two atoms belong to different chains. This may seem like a 
roundabout way of doing it, but if I call search_all(radius=5, 
level='R'), or indeed with level=any level other than 'A', I get the error

         TypeError: unorderable types: Residue() < Residue()

So my first question is why it might be that search_all isn't working at 
higher levels.

For the adjacent residue pairs I identify using NeighborSearch, I get 
each residue's position in its respective chain by residue.get_id()[1].

I've noticed, however, that if I get the sequence of the chain using seq 
= Selection.unfold_entities(chain, 'R') and then reference (i.e. 
seq[index]) the amino acids using the indices returned by the 
NeighborSearch step, they are not the same residues that I get if during 
the NeighborSearch step I report residue.get_resname() for each adjacent 
residue.

I've tried it with several proteins, and the problem is the same. Chains 
A and C of 2h62 are an example.

I then noticed that the lowest residue ID number of the residues yielded 
from Selection.unfold_entities(chain, 'R') is not 1. For chain A, it's 
11, and for chain C, it's 34. Not knowing why this was, I thought I'd 
try subtracting the lowest ID number from the indices returned by the 
NeighborSearch step (i.e. in chain A, 11 -> 0 so seq[0] would be the 
first residue, the one with ID 11). This happened to seem to work for 
chain A. However, it gives me negative indices for some of the contacts 
in chain C. This means that NeighborSearch can return residues that are 
not returned by unfold_entities(). The lowest residue ID returned by 
NeighborSearch for chain C was 24, whereas for unfold_entities() it was 34.

For both chains A and C, I was given the warning

         PDBConstructionWarning: WARNING: Chain [letter] is 
discontinuous at line [line number].

In fact, I seem to get this warning for just about every chain of every 
structure I load. Is this the reason that the first residues in the two 
chains are at 11 and 34, rather than 1? If so, could it be that 
NeighborSearch is able to work around the discontinuity while 
unfold_entities is not?

Any suggestions?

Thanks for your time and help,

James Jensen



More information about the Biopython mailing list