[BioPython] Reassigning parent ids in Bio.PDB-structures?
Peter
biopython at maubp.freeserve.co.uk
Mon Oct 8 15:45:16 UTC 2007
Christian Meesters wrote:
> Hi,
>
> I'm trying to 'split' a structure in several pieces, e.g. a former chain
> 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on.
> Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ...
>
> Perhaps some code explains better what I'm trying to achieve:
>
> breakpoints = [1254, 5444,
> 6690, 10888,
> 10889, 16332,
> 16333, 21776,
> 21776, 27220,
> 27221, 32665]
I'm assuming this is "breaks" later on.
> def split_chain(structure, breakpoints, outname = 'split.pdb'):
> chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
> 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
> 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
> 'X', 'Y', 'Z']
>
> chain = chains.pop(0)
> for atom in structure.get_atoms():
> number = atom.get_serial_number()
> if breaks and number == breaks[0]:
> breaks.pop(0)
> chain = chains.pop(0)
> atom.parent.parent.id = chain # assign new chain
>
> iostream = PDBIO()
> try:
> outfile = open(outname, 'w')
> iostream.set_structure(structure.structure)
> iostream.save(outfile)
> except IOError, msg:
> raise IOError(msg)
>
> So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to
> 5444. Instead the written pdb-file contains all atoms, but with the
> wrong chain ids (see above). (Please don't tell my how unpythonic the
> code reads, point is that I've tried so many different things that I
> first need to understand my logic mistake.)
>
> Any ideas, where my mistake is?
As the reason, I think this is what is happening: Given an atom, then
atom.parent will be a residue object, and atom.parent.parent will be a
chain object. Note all the atoms in a single amino acid residue will
share share the same .parent, and all the atoms in a single chain will
share the same .parent.parent
i.e. You have renamed Chain "A" to "A", and then later renamed this
chain to "B", and then again to "C". You didn't ever split up the chain
into sub chains.
I think you need to create a new chain objects instead... but I'm not
sure off hand how best to do this with Bio.PDB
To be honest, I would be tempted to write a quick and dirty script which
parsed the raw PDB file, and rewrote the chain field based on the atom
sequence number - without the overhead of the PDB parser.
Peter
More information about the Biopython
mailing list