[Biopython] (Bio.PDB) problems with NeighborSearch: error at levels above "A", residue index discrepancy with unfold_entities

Tue Oct 8 21:22:42 UTC 2013

Dear James,

Regarding problem 1. What you describe runs fine on my machine, using
Python 2.7.5 and an up-to-date Biopython git version. You logic seems fine,
maybe it's the version of Python you are using?

Regarding your second problem, that of the mismatched indexes. The
Selection method returns a *list* of residues while when you iterate over
the neighbors and ask for their id it gives back the id of the residue.
This id will only correspond to the Selection list index if your residues
are numbered from 1 to N without gaps. If your protein starts at residue 3,
then the first item given back by Selection has index 0 while in fact the
id is 3. Does this make sense?

The warning occurs if you have chain breaks. There should be some gaps in
your structure, starting at a number other than 1 does not raise this
warning normally.

Cheers and sorry for the late reply,

João

2013/8/30 James Jensen <jdjensen at eng.ucsd.edu>

> Hello!
>
> I am writing a function that, given two chains in a PDB file, should
> return 1) the positions and identities of all residues that are in contact
> with (distance < 5 angstroms) a residue on the other chain, and 2) the
> amino acid sequences of the chains. I've been doing this with
> NeighborSearch.search_all(**radius=5, level='A') and then for each atom
> pair, seeing what its parent residue is and whether the parent residues of
> the two atoms belong to different chains. This may seem like a roundabout
> way of doing it, but if I call search_all(radius=5, level='R'), or indeed
> with level=any level other than 'A', I get the error
>
>         TypeError: unorderable types: Residue() < Residue()
>
> So my first question is why it might be that search_all isn't working at
> higher levels.
>
> For the adjacent residue pairs I identify using NeighborSearch, I get each
> residue's position in its respective chain by residue.get_id()[1].
>
> I've noticed, however, that if I get the sequence of the chain using seq =
> Selection.unfold_entities(**chain, 'R') and then reference (i.e.
> seq[index]) the amino acids using the indices returned by the
> NeighborSearch step, they are not the same residues that I get if during
> the NeighborSearch step I report residue.get_resname() for each adjacent
> residue.
>
> I've tried it with several proteins, and the problem is the same. Chains A
> and C of 2h62 are an example.
>
> I then noticed that the lowest residue ID number of the residues yielded
> from Selection.unfold_entities(**chain, 'R') is not 1. For chain A, it's
> 11, and for chain C, it's 34. Not knowing why this was, I thought I'd try
> subtracting the lowest ID number from the indices returned by the
> NeighborSearch step (i.e. in chain A, 11 -> 0 so seq[0] would be the first
> residue, the one with ID 11). This happened to seem to work for chain A.
> However, it gives me negative indices for some of the contacts in chain C.
> This means that NeighborSearch can return residues that are not returned by
> unfold_entities(). The lowest residue ID returned by NeighborSearch for
> chain C was 24, whereas for unfold_entities() it was 34.
>
> For both chains A and C, I was given the warning
>
>         PDBConstructionWarning: WARNING: Chain [letter] is discontinuous
> at line [line number].
>
> In fact, I seem to get this warning for just about every chain of every
> structure I load. Is this the reason that the first residues in the two
> chains are at 11 and 34, rather than 1? If so, could it be that
> NeighborSearch is able to work around the discontinuity while
> unfold_entities is not?
>
> Any suggestions?
>
> Thanks for your time and help,
>
> James Jensen
> ______________________________**_________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/biopython<http://lists.open-bio.org/mailman/listinfo/biopython>
>