[BioPython] Interface to sequence information in PDB Files?
Peter
biopython at maubp.freeserve.co.uk
Thu Jan 18 22:17:06 UTC 2007
Andrew D. Fant wrote:
>> This was something I was thinking about doing using Bio.PDB for the new
>> Bio.SeqIO code that I've been working on ...
>
> Yes, that's more or less the functionality that I was hoping to find. I would
> have been happy to have the SEQRES records show up as a sequence object, but
> actually reading the structure is probably the right approach. I think that
> putting a single gap character is the right thing to do for unsolved residues by
> default
OK, I've stuck a file called PdbIO.py on Bug 2059, comment 13
http://bugzilla.open-bio.org/show_bug.cgi?id=2059#c13
Direct link to the attachment:
http://bugzilla.open-bio.org/attachment.cgi?id=548&action=view
You should be able to save this anywhere and run it. I hope to include
something like this in Bio.SeqIO but would like some feedback first.
> It might not be bad to provide an option to either only parse the SEQRES records
> in the file,
Right now Bio.PDB seems to ignore the SEQRES lines (as well as other
interesting data like the HELIX lines), so pulling out the SEQRES
information as SeqRecord objects would take a little longer - but in
many ways is much easier.
Do you think these SEQRES sequences are actually more or less useful
that those from the 3D structure?
> or possibly use the data there to fill in if the depositor included
> the sequence data for disordered residues. I am not enough of a standards
> lawyer to know how common that is in PDB entries, or even if it's allowed,
> required, or forbidden, but if it is something that happens, being able to take
> advantage of the situation would be nice.
I have seen the FTNOTE lines used to comment about disordered side
chains, and free text comments about missing residues and poorly ordered
loops in generic REMARK lines. These look impossible to process
automatically. Sadly.
Anyway, please have a play with that code and let me know how you get on
- and if you think it would be useful even as is for BioPython.
Peter
More information about the Biopython
mailing list