[Biopython-dev] Re: [BioPython] GRAVY index program anyone?
Andrew Dalke
dalke at dalkescientific.com
Fri Aug 1 03:40:05 EDT 2003
Yair:
> Here is one of my functions. I have a collection of many protein
> analysis
> functions, maybe its time to put together a module.
It would be.
BTW, here's a way to make things go faster - make the dict include
the lowercase characters. This means you don't need to scan/convert
the sequence before acting on it.
# Kyte & Doolittle hydrophobiciy index
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }
# add in the lowercase characters
_full_kd = kd.copy()
_full_kd.update(dict([ (k.lower(), v) for k, v in kd.items()]))
# calculate the garvy according to kyte and doolittle.
def Gravy(ProteinSequence):
_kd = _full_kd # slightly faster performance with a local name
lookup
ProtGravy=0.0
for i in ProteinSequence:
ProtGravy += _kd[i]
return ProtGravy/len(ProteinSequence)
I don't think there's a faster way. Other tricks, like
sum([kd[c] for c in s])
and
sum(map(kd.__getitem__, s))
for the main loop are both slower because they build up the
intermediate list. I even played around with
def iter_lookup(d, s):
for c in s: yield d
sum(iter_lookup(_kd, ProteinSequence))
but at least for a short sequence it's also slower - perhaps because
of the '.next()' method call overhead?
Andrew
dalke at dalkescientific.com
More information about the Biopython-dev
mailing list