[BioPython] Biopython to parse not only .pdb-files but also NACCESS .asa files

O.Doehring at cs.ucl.ac.uk O.Doehring at cs.ucl.ac.uk
Mon May 21 15:45:36 EDT 2007


Dear community,
 
I am applying the following tool: 'Naccess V2.1.1 - Atomic Solvent 
Accessible Area Calculations ' to calculate two features which are not 
contained in standard .pdb-files. These two features are atomic 
accessiblity and van der Waal radius. As can be read in the readme file at 
http://wolf.bi.umist.ac.uk/naccess/nac_readme.html under 'example output 
files' and at the PDB-Format site at 
http://www.wwpdb.org/documentation/format23/sect9.html under 'Atom'. 
NACCESS does the following: 'The output format is PDB, with B-factors and 
occupancies removed, then atomic accessiblity in square Angstroms, followed 
by the assigned van der Waal radius.' Note that Occupancy gets replaced by 
atomic accessiblity and B-factor by the van der Waal radius. This 'new' 
.pdb-file has extension .asa.
 
I chose a quite straight-forward approach: I wanted to use Biopython as 
before, e.g. calling the B-Factor method but yielding the atomic 
accessiblity instead. But Biopython seems to type-check the .asa-file and 
complains that the B-factor is not of type float.

Is there a way to access the data of .asa-files programmatically via the 
Biopython library? The only other way then seems to write a parser for 
.asa-files and to figure out which atomic element in the .pdb-file 
corresponds to the respective one in the .asa-file and finally to retrieve 
the wanted values for atomic accessiblity and van der Waal radius.
 

Here are some more technical details. As an example I chose the '1DHR' 
protein:
 
------------------------------------------------------------------------------

def __init__(self,structure_id="1DHR",indices=[ 0]): 

# which residues are part of the patch 
self.indices = indices


# If 1 (DEFAULT), the exceptions are caught, but some residues or atoms 
will be missing.

# THESE EXCEPTIONS ARE DUE TO PROBLEMS IN THE PDB FILE! 
self.p=PDBParser(PERMISSIVE=

1) 

# which protein to analyse 
self.structure_id = structure_id

self.fileName = self.structure_id + 

'.asa' 
self.structure = self.p.get_structure(self.structure_id, self.fileName)

------------------------------------------------------------------------------
 
Error message:
 
Traceback (most recent call last):


File "C:\Dokumente und Einstellungen\Renate 
Döhring\workspace\test\src\root\nested\compactness.py", line 249, in 
<module> c = compact(indices=[0,1])


File "C:\Dokumente und Einstellungen\Renate 
Döhring\workspace\test\src\root\nested\compactness.py", line 17, in 
__init__ self.structure = self.p.get_structure(self.structure_id, 
self.fileName)


File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 65, in 
get_structure self._parse(file.readlines())


File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 85, in 
_parse self.trailer=self._parse_coordinates(coords_trailer)


File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 159, in 
_parse_coordinates bfactor=float(line[60:66])

ValueError: invalid literal for float(): 31 1.

------------------------------------------------------------------------------

I hope this question above was not discussed before but neither the search 
engine at http://search.open-bio.org/cgi-bin/mail-search.cgi works nor 
could I find anything useful via a google search restricted to the archive 
using the 'site' attribute.
 
What do you recommend for my situation. Many thanks!
 
 
Yours,
 
Orlando




More information about the BioPython mailing list