[Biopython-dev] Re: [BioPython] GRAVY index program anyone?

Fri Aug 1 03:40:05 EDT 2003

Yair:
> Here is one of my functions. I have a collection of many protein 
> analysis
> functions, maybe its time to put together a module.

It would be.

BTW, here's a way to make things go faster - make the dict include
the lowercase characters.  This means you don't need to scan/convert
the sequence before acting on it.

# Kyte & Doolittle hydrophobiciy index
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
        'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
        'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
        'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }

# add in the lowercase characters
_full_kd = kd.copy()
_full_kd.update(dict([ (k.lower(), v) for k, v in kd.items()]))

# calculate the garvy according to kyte and doolittle.
def Gravy(ProteinSequence):
     _kd = _full_kd  # slightly faster performance with a local name 
lookup
     ProtGravy=0.0
     for i in ProteinSequence:
         ProtGravy += _kd[i]

     return ProtGravy/len(ProteinSequence)

I don't think there's a faster way.  Other tricks, like
   sum([kd[c] for c in s])
and
   sum(map(kd.__getitem__, s))
for the main loop are both slower because they build up the
intermediate list.  I even played around with

def iter_lookup(d, s):
   for c in s: yield d

   sum(iter_lookup(_kd, ProteinSequence))

but at least for a short sequence it's also slower - perhaps because
of the '.next()' method call overhead?

					Andrew
					dalke at dalkescientific.com