[Biopython] Superimposer troubles

Tue Apr 2 09:38:24 UTC 2013

On Tue, Apr 2, 2013 at 5:40 AM, Willis, Jordan R
<jordan.r.willis at vanderbilt.edu> wrote:
>
> Hello List,
>
>
> I'm having trouble working through some issues with the superimposer for all-atom
> superpositions. Often times, we work on protein design and our end PDB files
>differs in atom-number and sometimes composition from our input. I'm a big fan
> of the Superimposer, so we have implemented like this:
>
> p = PDBParser()
> native_pdb = p.get_structure("input","input.pdb")
> designed_pdb = p.get_structure("output","output.pdb")
>
>
> native_ca_atoms = []
> native_all_atoms = []
> designed_ca_atoms = []
> designed_all_atoms = []
> for (native_residue, designed_residue) in zip(native_pdb.get_residues(), designed_pdb.get_residues()):
>         native_ca_atoms.append(native_residue['CA'])
>         designed_ca_atoms.append(native_residue['CA']
>         ...
>
> For the CA atom residues its not really a big deal since everything we design
> has a CA atom. However when we go into all atoms, it turns out that the
> designed residue and the native residue can be different, thus leading to a
> different number of atoms. I didn't realize, but the zip function was making
> these two lists as big as the smallest list and not necessarily matching up
> the atoms. It would just hack off some part of the larger list!  This way,
> the superimposer was never failing because it always had an exact
> match of atoms.

How about using izip_longest (from itertools) rather than zip? That
should give a clear error when the residue counts are different.

In general however, dealing with similar but different chains will
require some sort of pairwise alignment and/or restricting to just
backbone atoms (like CA, C-alpha).

Peter