[BioPython] Interface to sequence information in PDB Files?

Thu Jan 18 00:17:06 UTC 2007

Andrew D. Fant wrote:
> I'm working on a project that involves the sequences of entries in the PDB.  I
> can do a brute force extraction of the sequences and conversion to FASTA (for
> example) format, but I'd like to use a clean interface for this if I can.  Is
> there a good way to create sequence objects from PDB data in biopython, and if
> there is, could someone point me to some sample code demonstrating it?

This was something I was thinking about doing using Bio.PDB for the new 
Bio.SeqIO code that I've been working on:

http://www.biopython.org/wiki/SeqIO

I haven't written anything yet specifically for PDB files, but my idea 
was to produce a SeqRecord for each peptide chain in the PDB file - 
based on the residues in the 3D structure, not the stated sequence in 
the header of the PDB file.

Does this sound close to what you had in mind?

One big question I was thinking about is how would it be best to handle 
chains with breaks in them (e.g. residues missing from the PDB file 
because they were not solved).  Simply skipping them in the sequence and 
returning a single continuous amino acid sequence would be misleading, 
so perhaps including a single gap character would suffice?

Peter