[BioPython] parse element symbols for ATOM/HETATM records

Michiel de Hoon mjldehoon at yahoo.com
Sat Apr 26 00:53:17 UTC 2008


Dear Hongbo Zhu,

Could you open a bug report on BugZilla and append your patch there?
Patches sent to the mailing list tend to get lost.

--Michiel.

Macro Zhu <macrozhu at gmail.com> wrote:  Hi,

the current Bio.PDB.PDBParser does not parse column 77-78 from ATOM records
in PDB files, where element symbols are (usually) stored for ATOM. We
suggest BioPython to parse this information in the next version. The reasons
are given as follows:

1. The current remediated PDB format requires these symbols to be always
present ( http://www.wwpdb.org/documentation/format3.1-20080211.pdf ),
though in old PDB files (v2.3), these symbols are sometimes missing.

2. In some cases it is not straightforward, if not impossible, to recognize
hydrogen atoms by their identifiers in the remediated PDB files. e.g. in
1AWW,

ATOM    378 HD11 LEU A  25      46.755  -3.858   0.453  1.00  0.00
H
ATOM    379 HD12 LEU A  25      47.178  -2.160   0.234  1.00  0.00
H
ATOM    380 HD13 LEU A  25      47.054  -3.226  -1.165  1.00  0.00
H
ATOM    381 HD21 LEU A  25      49.453  -1.483   0.307  1.00  0.00
H
ATOM    382 HD22 LEU A  25      50.714  -2.537  -0.327  1.00  0.00
H
ATOM    383 HD23 LEU A  25      49.413  -1.984  -1.381  1.00  0.00
H

In this PDB entry, chemical symbols (H) are not right justified in column
13-14 for hydrogen identifiers like for other elements. A bit extra work is
required to figure it out.

What's more, sometimes it's even impossible to distinguish hydrogen from
mercury without columns 77-78. From the PDB entry format description version
2.1:
"Hydrogen naming sometimes conflicts with IUPAC conventions. For example, a
hydrogen named HG11 in columns 13 - 16 is differentiated from a mercury atom
by the element symbol in columns 77 - 78. Columns 13 - 16 present a unique
name for each atom."

Therefore we strongly suggest PDBParser to cover column 77-78 for
ATOM/HETATM records. We have looked at relevant code and it seems three
files (Atom.py, PDBParser.py, StructureBuilder.py) needed to be revised
marginally for integrating this update:

1). in Atom.py CVS Revision 1.18

line 17: add one parameter "element" to the function Atom::__init__(...)
    def __init__(self, name, coord, bfactor, occupancy, altloc, fullname,
serial_number, element):

line 61: add line
    self.element = element

add a set method:
    def set_element(self, element):
        self.element = element

add a public method:
    def get_element(self):
        return self.element


2). in PDBParser.py CVS Revision 1.20

line 161: add one line to parse element symbol in function
PDBParser::_parse_coordinates(self, coords_trailer)
    element=line[76:78].strip()

line 182: add one more parameter to init_atom():
    structure_builder.init_atom(name, coord, bfactor, occupancy, altloc,
fullname, serial_number, element)


3). in StructureBuilder.py CVS Revision 1.16

line 158: add one parameter "element" to the function
    StructureBuilder::init_atom(self, name, coord, b_factor, occupancy,
altloc, fullname, serial_number=None, element='')

line 190: add "element" to the initialization of Atom instance.
    atom=self.atom=myAtom(name, coord, b_factor, occupancy, altloc,
fullname, serial_number, element)



regards,
-- 
Hongbo Zhu
_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.



More information about the Biopython mailing list