[BioPython] Biopython to parse not only .pdb-files but also NACCESS .asa files
O.Doehring at cs.ucl.ac.uk
O.Doehring at cs.ucl.ac.uk
Mon May 21 19:45:36 UTC 2007
Dear community,
I am applying the following tool: 'Naccess V2.1.1 - Atomic Solvent
Accessible Area Calculations ' to calculate two features which are not
contained in standard .pdb-files. These two features are atomic
accessiblity and van der Waal radius. As can be read in the readme file at
http://wolf.bi.umist.ac.uk/naccess/nac_readme.html under 'example output
files' and at the PDB-Format site at
http://www.wwpdb.org/documentation/format23/sect9.html under 'Atom'.
NACCESS does the following: 'The output format is PDB, with B-factors and
occupancies removed, then atomic accessiblity in square Angstroms, followed
by the assigned van der Waal radius.' Note that Occupancy gets replaced by
atomic accessiblity and B-factor by the van der Waal radius. This 'new'
.pdb-file has extension .asa.
I chose a quite straight-forward approach: I wanted to use Biopython as
before, e.g. calling the B-Factor method but yielding the atomic
accessiblity instead. But Biopython seems to type-check the .asa-file and
complains that the B-factor is not of type float.
Is there a way to access the data of .asa-files programmatically via the
Biopython library? The only other way then seems to write a parser for
.asa-files and to figure out which atomic element in the .pdb-file
corresponds to the respective one in the .asa-file and finally to retrieve
the wanted values for atomic accessiblity and van der Waal radius.
Here are some more technical details. As an example I chose the '1DHR'
protein:
------------------------------------------------------------------------------
def __init__(self,structure_id="1DHR",indices=[ 0]):
# which residues are part of the patch
self.indices = indices
# If 1 (DEFAULT), the exceptions are caught, but some residues or atoms
will be missing.
# THESE EXCEPTIONS ARE DUE TO PROBLEMS IN THE PDB FILE!
self.p=PDBParser(PERMISSIVE=
1)
# which protein to analyse
self.structure_id = structure_id
self.fileName = self.structure_id +
'.asa'
self.structure = self.p.get_structure(self.structure_id, self.fileName)
------------------------------------------------------------------------------
Error message:
Traceback (most recent call last):
File "C:\Dokumente und Einstellungen\Renate
Döhring\workspace\test\src\root\nested\compactness.py", line 249, in
<module> c = compact(indices=[0,1])
File "C:\Dokumente und Einstellungen\Renate
Döhring\workspace\test\src\root\nested\compactness.py", line 17, in
__init__ self.structure = self.p.get_structure(self.structure_id,
self.fileName)
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 65, in
get_structure self._parse(file.readlines())
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 85, in
_parse self.trailer=self._parse_coordinates(coords_trailer)
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 159, in
_parse_coordinates bfactor=float(line[60:66])
ValueError: invalid literal for float(): 31 1.
------------------------------------------------------------------------------
I hope this question above was not discussed before but neither the search
engine at http://search.open-bio.org/cgi-bin/mail-search.cgi works nor
could I find anything useful via a google search restricted to the archive
using the 'site' attribute.
What do you recommend for my situation. Many thanks!
Yours,
Orlando
More information about the Biopython
mailing list