[BioPython] Interface to sequence information in PDB Files?
Peter
biopython at maubp.freeserve.co.uk
Thu Jan 18 00:17:06 UTC 2007
Andrew D. Fant wrote:
> I'm working on a project that involves the sequences of entries in the PDB. I
> can do a brute force extraction of the sequences and conversion to FASTA (for
> example) format, but I'd like to use a clean interface for this if I can. Is
> there a good way to create sequence objects from PDB data in biopython, and if
> there is, could someone point me to some sample code demonstrating it?
This was something I was thinking about doing using Bio.PDB for the new
Bio.SeqIO code that I've been working on:
http://www.biopython.org/wiki/SeqIO
I haven't written anything yet specifically for PDB files, but my idea
was to produce a SeqRecord for each peptide chain in the PDB file -
based on the residues in the 3D structure, not the stated sequence in
the header of the PDB file.
Does this sound close to what you had in mind?
One big question I was thinking about is how would it be best to handle
chains with breaks in them (e.g. residues missing from the PDB file
because they were not solved). Simply skipping them in the sequence and
returning a single continuous amino acid sequence would be misleading,
so perhaps including a single gap character would suffice?
Peter
More information about the Biopython
mailing list