[Biopython-dev] [Bug 2495] New: parse element symbols for ATOM/HETATM records (Bio.PDB.PDBParser)

Mon Apr 28 12:42:05 UTC 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2495

           Summary: parse element symbols for ATOM/HETATM records
                    (Bio.PDB.PDBParser)
           Product: Biopython
           Version: 1.45
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: macrozhu+biopy at gmail.com

Hi,

the current Bio.PDB.PDBParser does not parse column 77-78 from ATOM records
in PDB files, where element symbols are (usually) stored for ATOM. We
suggest BioPython to parse this information in the next version. The reasons
are given as follows:

1. The current remediated PDB format requires these symbols to be always
present ( http://www.wwpdb.org/documentation/format3.1-20080211.pdf ),
though in old PDB files (v2.3), these symbols are sometimes missing.

2. In some cases it is not straightforward, if not impossible, to recognize
hydrogen atoms by their identifiers in the remediated PDB files. e.g. in
1AWW,

ATOM    378 HD11 LEU A  25      46.755  -3.858   0.453  1.00  0.00
H
ATOM    379 HD12 LEU A  25      47.178  -2.160   0.234  1.00  0.00
H
ATOM    380 HD13 LEU A  25      47.054  -3.226  -1.165  1.00  0.00
H
ATOM    381 HD21 LEU A  25      49.453  -1.483   0.307  1.00  0.00
H
ATOM    382 HD22 LEU A  25      50.714  -2.537  -0.327  1.00  0.00
H
ATOM    383 HD23 LEU A  25      49.413  -1.984  -1.381  1.00  0.00
H

In this PDB entry, chemical symbols (H) are not right justified in column
13-14 for hydrogen identifiers like for other elements. A bit extra work is
required to figure it out.

What's more, sometimes it's even impossible to distinguish hydrogen from
mercury without columns 77-78. From the PDB entry format description version
2.1:
"Hydrogen naming sometimes conflicts with IUPAC conventions. For example, a
hydrogen named HG11 in columns 13 - 16 is differentiated from a mercury atom
by the element symbol in columns 77 - 78. Columns 13 - 16 present a unique
name for each atom."

Therefore we strongly suggest PDBParser to cover column 77-78 for
ATOM/HETATM records. We have looked at relevant code and it seems three
files (Atom.py, PDBParser.py, StructureBuilder.py) needed to be revised
marginally for integrating this update:

1). in Atom.py CVS Revision 1.18

line 17: add one parameter "element" to the function Atom::__init__(...)
    def __init__(self, name, coord, bfactor, occupancy, altloc, fullname,
serial_number, element):

line 61: add line
    self.element = element

add a set method:
    def set_element(self, element):
        self.element = element

add a public method:
    def get_element(self):
        return self.element

2). in PDBParser.py CVS Revision 1.20

line 161: add one line to parse element symbol in function
PDBParser::_parse_coordinates(self, coords_trailer)
    element=line[76:78].strip()

line 182: add one more parameter to init_atom():
    structure_builder.init_atom(name, coord, bfactor, occupancy, altloc,
fullname, serial_number, element)

3). in StructureBuilder.py CVS Revision 1.16

line 158: add one parameter "element" to the function
    StructureBuilder::init_atom(self, name, coord, b_factor, occupancy,
altloc, fullname, serial_number=None, element='')

line 190: add "element" to the initialization of Atom instance.
    atom=self.atom=myAtom(name, coord, b_factor, occupancy, altloc,
fullname, serial_number, element)

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.