[Bioperl-l] PDB ATOM records: name, segid, etc.

Andrew Dalke dalke@dalkescientific.com
Fri, 12 Jul 2002 16:33:55 -0600


Joe Krahn:
> Although SEGID is depricated by the official PDB standard,
> it is useful to me because I work with CNS files. Would people
> be opposed to supporting it in BioPerl? (Note- most crystallographers
> want to keep the SEGID. It is a useful thing, especially now
> that PDB disallows CHAINID for ligands.)

I say keep it.  You'll need it for old PDB files as well.  'Course,
you'll also need the logic to distinguish between the different
versions.

> Another useful but non-standard optional feature is a 4th residue
> character. It can be useful for designating variants of a residue,
> like HISD for HIS protonated at ND.

Also very important.  Great for things like "TIP3" waters.  (BTW,
I have an XPLOR background.  :)

> The first two letters are always the element. The aliegnment seems
> strange until you realize this. Think of carbon being represented
> by " C". A leading non-letter character is allowed for atom names
> that are too long, mostly hydrogens. The current pdb.pm shifts
> <number>H correctly (a good guess) but will get all 2-letter elements
> wrong. "CA  " for calcium will become " CA ", a carbon atom.

Um.  The letters of the first two characters are always the element.
Unless they aren't.  I've seen "U" used for "Unknown".  And then there's
dealing with all the programs which has a different interpretation of that
field.  Roger Sayle gave a talk about this early last year.

http://www.daylight.com/meetings/mug01/Sayle/m4xbondage.html
] 2c. All atom records containing " Q" as the first two characters of the
] atom name were ignored. The element " Q" is commonly used in NMR processing
] to represent pseudo atoms used in refinement

> So, if pdb.pm is going to remove the leading space on
> atom names (technically wrong, but probably desirable for many people)

That isn't technically wrong.  It depends on the context.  " C" is the
representation of "carbon" in the PDB file.  Internally, bioperl could store
it as "carbon" or "12" or "mixelja" so long as it is consistent and
captures the data model correctly.

> then reading an ATOM needs to generate the element entry when an ATOM
> doesn't include it. This can also be a problem - a PDB file with
> no element entries and improper atom alignment will generate bad
> element entries, but at least it works for all single-letter elements.

That's a hard task.  See section 3 in Roger's paper.

A question I have is:  Do you want a faithful representation of the data in the
PDB (in which case a missing element field is left missing) or do you want
a translation to another model of chemistry (as Roger does for SMILES)?


					Andrew
					dalke@dalkescientific.com