[Biopython-dev] [Bug 2495] New: parse element symbols for ATOM/HETATM records (Bio.PDB.PDBParser)
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Mon Apr 28 08:42:05 EDT 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2495
Summary: parse element symbols for ATOM/HETATM records
(Bio.PDB.PDBParser)
Product: Biopython
Version: 1.45
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: macrozhu+biopy at gmail.com
Hi,
the current Bio.PDB.PDBParser does not parse column 77-78 from ATOM records
in PDB files, where element symbols are (usually) stored for ATOM. We
suggest BioPython to parse this information in the next version. The reasons
are given as follows:
1. The current remediated PDB format requires these symbols to be always
present ( http://www.wwpdb.org/documentation/format3.1-20080211.pdf ),
though in old PDB files (v2.3), these symbols are sometimes missing.
2. In some cases it is not straightforward, if not impossible, to recognize
hydrogen atoms by their identifiers in the remediated PDB files. e.g. in
1AWW,
ATOM 378 HD11 LEU A 25 46.755 -3.858 0.453 1.00 0.00
H
ATOM 379 HD12 LEU A 25 47.178 -2.160 0.234 1.00 0.00
H
ATOM 380 HD13 LEU A 25 47.054 -3.226 -1.165 1.00 0.00
H
ATOM 381 HD21 LEU A 25 49.453 -1.483 0.307 1.00 0.00
H
ATOM 382 HD22 LEU A 25 50.714 -2.537 -0.327 1.00 0.00
H
ATOM 383 HD23 LEU A 25 49.413 -1.984 -1.381 1.00 0.00
H
In this PDB entry, chemical symbols (H) are not right justified in column
13-14 for hydrogen identifiers like for other elements. A bit extra work is
required to figure it out.
What's more, sometimes it's even impossible to distinguish hydrogen from
mercury without columns 77-78. From the PDB entry format description version
2.1:
"Hydrogen naming sometimes conflicts with IUPAC conventions. For example, a
hydrogen named HG11 in columns 13 - 16 is differentiated from a mercury atom
by the element symbol in columns 77 - 78. Columns 13 - 16 present a unique
name for each atom."
Therefore we strongly suggest PDBParser to cover column 77-78 for
ATOM/HETATM records. We have looked at relevant code and it seems three
files (Atom.py, PDBParser.py, StructureBuilder.py) needed to be revised
marginally for integrating this update:
1). in Atom.py CVS Revision 1.18
line 17: add one parameter "element" to the function Atom::__init__(...)
def __init__(self, name, coord, bfactor, occupancy, altloc, fullname,
serial_number, element):
line 61: add line
self.element = element
add a set method:
def set_element(self, element):
self.element = element
add a public method:
def get_element(self):
return self.element
2). in PDBParser.py CVS Revision 1.20
line 161: add one line to parse element symbol in function
PDBParser::_parse_coordinates(self, coords_trailer)
element=line[76:78].strip()
line 182: add one more parameter to init_atom():
structure_builder.init_atom(name, coord, bfactor, occupancy, altloc,
fullname, serial_number, element)
3). in StructureBuilder.py CVS Revision 1.16
line 158: add one parameter "element" to the function
StructureBuilder::init_atom(self, name, coord, b_factor, occupancy,
altloc, fullname, serial_number=None, element='')
line 190: add "element" to the initialization of Atom instance.
atom=self.atom=myAtom(name, coord, b_factor, occupancy, altloc,
fullname, serial_number, element)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list