[BioPython] Reassigning parent ids in Bio.PDB-structures?

Mon Oct 8 15:45:16 UTC 2007

Christian Meesters wrote:
> Hi,
> 
> I'm trying to 'split' a structure in several pieces, e.g. a former chain
> 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on.
> Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ...
> 
> Perhaps some code explains better what I'm trying to achieve:
> 
> breakpoints = [1254, 5444,
>                 6690, 10888,
>                 10889, 16332,
>                 16333, 21776,
>                 21776, 27220,
>                 27221, 32665]

I'm assuming this is "breaks" later on.

> def split_chain(structure, breakpoints, outname = 'split.pdb'):
>     chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
>                     'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
>                     'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
>                     'X', 'Y', 'Z']
> 
>     chain = chains.pop(0)
>     for atom in structure.get_atoms():
>         number = atom.get_serial_number()
>         if breaks and number == breaks[0]:
>             breaks.pop(0)
>             chain = chains.pop(0)
>         atom.parent.parent.id = chain # assign new chain
> 
>     iostream = PDBIO()
>     try:
>         outfile = open(outname, 'w')
>         iostream.set_structure(structure.structure)
>         iostream.save(outfile)
>     except IOError, msg:
>         raise IOError(msg)
> 
> So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to
> 5444. Instead the written pdb-file contains all atoms, but with the
> wrong chain ids (see above). (Please don't tell my how unpythonic the
> code reads, point is that I've tried so many different things that I
> first need to understand my logic mistake.)
> 
> Any ideas, where my mistake is?

As the reason, I think this is what is happening: Given an atom, then 
atom.parent will be a residue object, and atom.parent.parent will be a 
chain object.  Note all the atoms in a single amino acid residue will 
share share the same .parent, and all the atoms in a single chain will 
share the same .parent.parent

i.e. You have renamed Chain "A" to "A", and then later renamed this 
chain to "B", and then again to "C".  You didn't ever split up the chain 
into sub chains.

I think you need to create a new chain objects instead... but I'm not 
sure off hand how best to do this with Bio.PDB

To be honest, I would be tempted to write a quick and dirty script which 
parsed the raw PDB file, and rewrote the chain field based on the atom 
sequence number - without the overhead of the PDB parser.

Peter