[Biopython-dev] Fwd: Bug in DSSP.py

Peter Cock p.j.a.cock at googlemail.com
Wed Nov 9 19:53:52 UTC 2011


FYI, hopefully someone uses DSSP.

---------- Forwarded message ----------
From: Austin Meyer
Date: Tuesday, November 8, 2011
Subject: Bug in DSSP.py
To: biopython-owner at lists.open-bio.org


Ahoy,

I have no idea how to contribute code so I thought I would pass this along.

The newest DSSP adds a citation section for the first two lines, and a
blank third line in it's output file.  The parser reads each line one at a
time, splits it, then looks at the second element of the resulting list.
As the blank line has only one element, there is an index out of range
failure that occurs. This error does not happen with the older DSSP
version.  A quick fix checks the length of the list prior to looking at
it's elements.  Thus at line 121 in the DSSP.py file, just after the sl =
l.split(), this will fix the problem:

*if len(sl) < 2:
>   continue*
>

The whole function will look like so:

*def make_dssp_dict(filename):
>     """
>     Return a DSSP dictionary that maps (chainid, resid) to
>     aa, ss and accessibility, from a DSSP file.
>
>     @param filename: the DSSP output file
>     @type filename: string
>     """
>     dssp = {}
>     handle = open(filename, "r")
>     try:
>         start = 0
>         keys = []
>         for l in handle.readlines():
>             sl = l.split()
>             if len(sl) < 2:
>                 continue
>             if sl[1] == "RESIDUE":
>                 # Start parsing from here
>                 start = 1
>                 continue
>             if not start:
>                 continue
>             if l[9] == " ":
>                 # Skip -- missing residue
>                 continue
>             resseq = int(l[5:10])
>             icode = l[10]
>             chainid = l[11]
>             aa = l[13]
>             ss = l[16]
>             if ss == " ":
>                 ss = "-"
>             try:
>                 acc = int(l[34:38])
>                 phi = float(l[103:109])
>                 psi = float(l[109:115])
>             except ValueError, exc:
>                 # DSSP output breaks its own format when there are >9999
>                 # residues, since only 4 digits are allocated to the seq
> num
>                 # field.  See 3kic chain T res 321, 1vsy chain T res 6077.
>                 # Here, look for whitespace to figure out the number of
> extra
>                 # digits, and shift parsing the rest of the line by that
> amount.
>                 if l[34] != ' ':
>                     shift = l[34:].find(' ')
>                     acc = int((l[34+shift:38+shift]))
>                     phi = float(l[103+shift:109+shift])
>                     psi = float(l[109+shift:115+shift])
>                 else:
>                     raise ValueError, exc
>             res_id = (" ", resseq, icode)
>             dssp[(chainid, res_id)] = (aa, ss, acc, phi, psi)
>             keys.append((chainid, res_id))
>     finally:
>         handle.close()
>     return dssp, keys
> *



Thanks,

--
Austin Meyer



More information about the Biopython-dev mailing list