[Biopython] writing a PDB----PLEASEEEE HELP

Nathan Edwards nje5 at georgetown.edu
Wed Oct 27 14:27:56 UTC 2010


I needed (well decided that I wanted) to figure this out for the Bio.PDB 
lecture in my Python/Bioinformatics course a couple of weeks ago.

import Bio.PDB

parser = Bio.PDB.PDBParser()
structure1 = parser.get_structure("2WFI","2WFI.pdb")
structure2 = parser.get_structure("2GW2","2GW2.pdb")

# Quickly/easily extract amino-acid residue chains
ppb=Bio.PDB.PPBuilder()

# Figure out how the query and subject peptides correspond...
# Done manually based on sequence alignment here, but non-trivial in
# general.

# query has an extra residue at the front
# subject has two extra residues at the back
# Assume first peptide is the right one in each case,
# (presumes) single chain models.

query = ppb.build_peptides(structure1)[0][1:]
subject = ppb.build_peptides(structure2)[0][:-2]

# Get C alpha atoms for each
qatoms = [ r['CA'] for r in query ]
satoms = [ r['CA'] for r in subject ]

# Superimpose...
superimposer = Bio.PDB.Superimposer()
superimposer.set_atoms(qatoms, satoms)

print "Query and subject superimposed, RMS:", superimposer.rms

# Apply transformation to structure2
superimposer.apply(structure2.get_atoms())

# Magic - write two structures to one file as two models
# This PDBIO usage is not documented, but can be found
# embedded in Bio.PDB.PDBIO.py
#
outfile=open("out.pdb", "w")
io=Bio.PDB.PDBIO(1)
io.set_structure(structure1)
io.save(outfile)
io.set_structure(structure2)
io.save(outfile, write_end=1)
outfile.close()

I wanted to do this in order to visualize multiple proteins's alignments 
in PyMol, which I do not know well. Perhaps there is a PyMol specific 
way to show two PDB files (pre-aligned or aligned by PyMol) to visualize 
residues where two chains' structures do not line up.

It is possible to use the above approach, but it is necessary to set a 
special PyMol attribute to see both models.

Load in the out.pdb file in PyMol, select menu item: Setting -> "Edit 
All..." and set "all_states" to "on" (second from the top).

Hope this helps,

- n

-- 
Dr. Nathan Edwards                      nje5 at georgetown.edu
Department of Biochemistry and Molecular & Cellular Biology
            Georgetown University Medical Center
Room 1215, Harris Building          Room 347, Basic Science
3300 Whitehaven St, NW              3900 Reservoir Road, NW
Washington DC 20007                     Washington DC 20007
Phone: 202-687-7042                     Phone: 202-687-1618
Fax: 202-687-0057                         Fax: 202-687-7186



More information about the Biopython mailing list