[Biopython-dev] Bio.PDB - Missing values (was Moratorium on commits?)
Peter Cock
p.j.a.cock at googlemail.com
Fri Aug 23 05:05:02 EDT 2013
On Tue, Aug 20, 2013 at 11:16 PM, Lenna Peterson <arklenna at gmail.com> wrote:
>
> On Thu, Aug 15, 2013 at 9:23 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>>
>> I didn't mean to suggest writing the string "None" in the field, and
>> I'm not sure if João did - it would certainly be an invalid PDB file.
>>
>> I agree that where the data structure has None (e.g. from our parser)
>> then the writer could use a blank string (of the appropriate width).
>> For mandatory fields like occupancy, this should give a warning.
>>
>
> As I suspected, the writer currently fails on None (it's expecting a float).
> Test-driven development!
>
> However, I don't see a simple or elegant way to force writing of a blank
> occupancy. ATOM lines are currently written using C-style string formatting,
> and the occupancy field is `%6.2f`.
>
> Off the top of my head, I'd:
>
> 1. Store the original format string
> 2. Modify the format string to have "%6s" at the appropriate position
> 3. Modify the occupancy to be an empty string or a space
> 4. Set the return value to the formatted string
> 5. Restore the original format string
> 6. Return the return value
>
> However, this seems...ugly at best. I don't know that switching formatting
> styles (e.g. to string.format() or others) will help. And in most
> circumstances, the type checking of the format string is useful.
>
> Any thoughts?
I would suggest something like this (untested):
$ git diff
diff --git a/Bio/PDB/PDBIO.py b/Bio/PDB/PDBIO.py
index 2f64571..11a52ca 100644
--- a/Bio/PDB/PDBIO.py
+++ b/Bio/PDB/PDBIO.py
@@ -8,7 +8,7 @@
from Bio.PDB.StructureBuilder import StructureBuilder # To allow
saving of chains, residues, etc..
from Bio.Data.IUPACData import atom_weights # Allowed Elements
-_ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c
%8.3f%8.3f%8.3f%6.2f%6.2f %4s%2s%2s\n"
+_ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c %8.3f%8.3f%8.3f%s%6.2f
%4s%2s%2s\n"
class Select(object):
@@ -85,8 +85,21 @@ class PDBIO(object):
x, y, z=atom.get_coord()
bfactor=atom.get_bfactor()
occupancy=atom.get_occupancy()
+ # Handle a missing occupancy (None) with a blank entry:
+ try:
+ occupancy_str = "%6.2f" % occupancy
+ except TypeError:
+ if occupancy is None:
+ occupancy_str = " " * 6
+ import warnings
+ from Bio import BiopythonWarning
+ # TODO - Introduce exception BiopythonWriterWarning?
+ warning.warn("Missing occupancy will be recorded as blank",
+ BiopythonWarning)
+ else:
+ raise TypeError("Invalid occupancy %r in atom %r" %
(occupancy, atom))
args=(record_type, atom_number, name, altloc, resname, chain_id,
- resseq, icode, x, y, z, occupancy, bfactor, segid,
+ resseq, icode, x, y, z, occupancy_str, bfactor, segid,
element, charge)
return _ATOM_FORMAT_STRING % args
The error message could be improved (e.g. a more helpful identification
of the ATOM at fault)?
Peter
More information about the Biopython-dev
mailing list