[Biopython-dev] Features of the GSOC branch ready to be merged

Fri Jan 21 15:13:48 UTC 2011

Hello all,

I've been working on the renumbering residues, remove disordered atoms, and
biological unit representation functions.

I've made quite some changes, specially to the renumbering algorithm.
Explanation follows:

Before I simply calculated how much to subtract from each residue number
based on the first. That worked perfectly if all residue numbers were in a
growing progression, which was not the case for some structures. Also,
HETATMs weren't separated from the main ATOM lines, and in many PDB files
you see numbering starting from 1000 for example.

What I coded allows for certain discrimination of HETATMs from ATOMs based
on the SEQRES field of the PDB file header (added parsing to
parse_pdb_header). This ensures HETATMs are numbered from 1000. I've also
incorporated a way of filtering modified aminoacids (that show up as HETATM
but in between ATOM lines) to be treated as ATOMs if there is no SEQRES
header present in the PDB file by looking for a CA atom. A warning is issued
along with this "magic" feature turning on annoucing that the results may be
a bit unreliable..

I've shown the code and the idea to the people in my lab and I got generally
good responses, but of course they are all biased :) Have a look for
yourselves, I created a branch for these.

https://github.com/JoaoRodrigues/biopython/tree/pdb_enhancements

Thanks!