[BioPython] Interface to sequence information in PDB Files?

Thu Jan 18 22:17:06 UTC 2007

Andrew D. Fant wrote:
>> This was something I was thinking about doing using Bio.PDB for the new
>> Bio.SeqIO code that I've been working on ...
> 
> Yes, that's more or less the functionality that I was hoping to find.  I would
> have been happy to have the SEQRES records show up as a sequence object, but
> actually reading the structure is probably the right approach.  I think that
> putting a single gap character is the right thing to do for unsolved residues by
> default

OK, I've stuck a file called PdbIO.py on Bug 2059, comment 13
http://bugzilla.open-bio.org/show_bug.cgi?id=2059#c13

Direct link to the attachment:
http://bugzilla.open-bio.org/attachment.cgi?id=548&action=view

You should be able to save this anywhere and run it.  I hope to include 
something like this in Bio.SeqIO but would like some feedback first.

> It might not be bad to provide an option to either only parse the SEQRES records
> in the file,

Right now Bio.PDB seems to ignore the SEQRES lines (as well as other 
interesting data like the HELIX lines), so pulling out the SEQRES 
information as SeqRecord objects would take a little longer - but in 
many ways is much easier.

Do you think these SEQRES sequences are actually more or less useful 
that those from the 3D structure?

 > or possibly use the data there to fill in if the depositor included
> the sequence data for disordered residues. I am not enough of a standards
> lawyer to know how common that is in PDB entries, or even if it's allowed,
> required, or forbidden, but if it is something that happens, being able to take
> advantage of the situation would be nice.

I have seen the FTNOTE lines used to comment about disordered side 
chains, and free text comments about missing residues and poorly ordered 
loops in generic REMARK lines.  These look impossible to process 
automatically.  Sadly.

Anyway, please have a play with that code and let me know how you get on 
- and if you think it would be useful even as is for BioPython.

Peter