[Biopython] Module Polypeptide

Tue Jun 9 11:28:10 UTC 2009

I intended to CC this back to the mailing list...

---------- Forwarded message ----------
From: Peter
Date: Tue, Jun 9, 2009 at 12:27 PM
Subject: Re: [Biopython] Module Polypeptide
To: stanam bharat

On Tue, Jun 9, 2009 at 12:14 AM, stanam bharat wrote:
> Ya..exactly,you have even mentioned this in the biopdb_faq.pdf . I tried
> this earlier. But my problem is the output.Though the result meets all the
> criteria, I want the output in single letter code in a sequence fashion(only
> residues in rows, not as column along with extra information) , which I got
> using PPBuilder.So can't  modify the output?

Rereading your code, do you just want to extract the amino acid
sequence of the chain?

Perhaps sticking with your original polypeptide approach might be
best. Note you can change the distance threshold for detecting chain
discontinuities (i.e. set the radius to something large):

from Bio.PDB.Polypeptide import PPBuilder
ppb=PPBuilder(radius=1000.0)
i = 0
for pp in ppb.build_peptides(s) :
...

However, the code still detects discontinuities. You could cheat and
glue them back together maybe... but I would first try and work out
why the builder thinks the chain is discontinuous. This could be
important for the biological question you have in mind.

For the alternative approach, the chain object doesn't have a
get_sequence() method like the polypeptide object, but you can do
something like this:

from Bio.PDB.PDBParser import PDBParser
p=PDBParser(PERMISSIVE=1)
structure_id="3FCS"
filename="pdb3fcs.ent"
s=p.get_structure(structure_id, filename)

from Bio.PDB.Polypeptide import to_one_letter_code
f=open("final2.txt","w")
for model in s :
   for chain in model :
       #Try adjusting depending on if you expect just the 20
       #standard amino acids etc.
       #aminos = [to_one_letter_code.get(res.resname,"X") \
       #          for res in chain if res.resname != "HOH"]
       aminos = [to_one_letter_code.get(res.resname,"X") \
                 for res in chain if "CA" in res.child_dict]
       sequence = "".join(aminos)
       f.write("%s:%s:%s\n" % (structure_id, chain.id, sequence))
f.close()

You should check the end of the chain carefully - in addition to lots
of water molecules (which I guess may be associated with the peptide
in some why) there may be other non-standard amino acid residues.

Peter