[Biopython] Reading large files, Biopython cookbook example

Andrew Dalke dalke at dalkescientific.com
Tue Aug 6 18:49:35 UTC 2013


On Aug 6, 2013, at 11:35 AM, Peter Cock wrote:
> In the long run this problem should go away as the PDB moves
> to using the The PDBx/mmCIF  format:
> http://www.wwpdb.org/news/news_2013.html#22-May-2013

Either you are optimistic or a ultra marathon runner! The
move over to mmCIF started of course 20 years ago, and that
link you gave said the change applies only to very large
structures:

    Structures that do not exceed the limitations of the PDB
    format will continue to be provided as PDB files in the
    archive for the foreseeable future.

Even for large files, which previously would split the structure
over multiple records, there will be a "best-effort" PDB format,
available as a web service.


40 years of the PDB format => well-entrenched => not going to
get rid of it any time soon. 



For another historical side-note, the PDB format started in
the early 1970s, but contains a kernel which is even older!
Quoting from

  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf :

  In order to establish the PDB, acceptance by the crystallographic
  community was necessary, requiring a pilgrimage in 1970 to the Medical
  Research Council (MRC) laboratory and Crystal Data Centre (CDC) in
  Cambridge. One result of this exchange was a concession that coordinates
  of protein structures would be stored in the same format as the small
  molecule CDC database (with a redundant ATOM label at the beginning of
  each card), retaining the now-arcane counting number at the end. But the
  idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond,
  and colleagues in Cambridge.

The "now-arcane" counting number has long disappeared from the
spec. It was there, I believe, so that if the punch cards were
dropped then they could be resorted based on the last few columns.
(I imagine you could also write a program to strip out the
C-alpha cards, work with them, then merge the C-alphas back into
the card deck correctly.)

				Andrew
				dalke at dalkescientific.com





More information about the Biopython mailing list