[Bioperl-l] PDB ATOM records: name, segid, etc.

Joe Krahn krahn@niehs.nih.gov
Fri, 12 Jul 2002 17:55:04 -0400


When I first looked at bioperl, it used PDB mainly for sequences,
but it's looking really good for stuctural biology purposes now.
I can write some code, but I'd like to find what other BioPerl
users think.

Although SEGID is depricated by the official PDB standard,
it is useful to me because I work with CNS files. Would people
be opposed to supporting it in BioPerl? (Note- most crystallographers
want to keep the SEGID. It is a useful thing, especially now
that PDB disallows CHAINID for ligands.)

Another useful but non-standard optional feature is a 4th residue
character. It can be useful for designating variants of a residue,
like HISD for HIS protonated at ND.

The last point is about atom name alignment. This is an standard
PDB item, and it should be fixedm, but it is a complex issue.

BioPerl does atom name alignment incorrectly , like most other PDB
programs. The proper alignment is documented in Appendix 3:
http://www.rcsb.org/pdb/docs/format/pdbguide2.2/part_76.html

The first two letters are always the element. The aliegnment seems
strange until you realize this. Think of carbon being represented
by " C". A leading non-letter character is allowed for atom names
that are too long, mostly hydrogens. The current pdb.pm shifts
<number>H correctly (a good guess) but will get all 2-letter elements
wrong. "CA  " for calcium will become " CA ", a carbon atom.

Writing the nameshould be based on the element entry.
However, many PDB files have the atom name correct, but no element
entry. So, if pdb.pm is going to remove the leading space on
atom names (technically wrong, but probably desirable for many people)
then reading an ATOM needs to generate the element entry when an ATOM
doesn't include it. This can also be a problem - a PDB file with
no element entries and improper atom alignment will generate bad
element entries, but at least it works for all single-letter elements.

Joe Krahn