[Biopython-dev] Re: [BioPython] Bug in Bio.SeqUtils ?

Yair Benita yair.benita at gmail.com
Fri Feb 3 03:39:38 EST 2006


Hi,
Sorry I missed to follow up on that bug.
I need to revise the isoelectric point anyway since in some rare cases it
gets stuck in an endless while loop. I will also look into adding code to
handle the X in the amino acid sequence. For now I think its OK to produce a
warning instead of an exception.

Yair


on 2/2/06 10:14 PM, Iddo Friedberg at idoerg at burnham.org wrote:

> Oh, sorry.
> 
> Your second problem was with protein_scale, which does indeed break on
> any letter not of the 20 regular amino acids.
> 
> I inserted this into a try/except clause which produces a warning to
> stderr, instead of raising an exception. It is now in CVS.
> 
> Yair, is that OK, or would we rather leave the exception raising bit
> there? There are arguments either way...
> 
> 
> ./I
> 
> 
> Iddo Friedberg wrote:
> 
>> Which version are you using? I tried the 1a8y sequence which you gave,
>> and also a sequence with an 'X', and they worked fine for me. CVS
>> version.
>> 
>> # seq is a Record object. seq.sequence is a string with the protein
>> sequence
>> 
>>>>> from Bio.SeqUtils import ProtParam
>>>>> ps = ProtParam.ProteinAnalysis(seq.sequence)
>>>>> ps.isoelectric_point()
>> 3.9298931884765151
>> 
>> 
>> # and for a sequence with an 'x'
>>>>> ps2 = ProtParam.ProteinAnalysis('xsdfgvcrtyip')
>>>>> ps2.isoelectric_point()
>> 5.8285980224609375
>> 
>> Bin Hu wrote:
>> 
>>> Hi,
>>> 
>>> When using Bio.SeqUtils to estimate isoelectric point for PDB entry
>>> 1a8y, it
>>> seems the function isoelectric_point() cannot reach an end, although it
>>> worked pretty well for all the other entries that I've tested. Could
>>> this be
>>> a bug in Bio.SeqUtils?
>>> 
>>> If anyone want to test it, blow is the sequence of 1a8y:
>>> 
>>> eegldfpeydgvdrvinvnaknyknvfkkyevlallyheppeddkasqrqfemeelilel
>>> aaqvledkgvgfglvdsekdaavakklglteedsiyvfkedevieydgefsadtlvefll
>>> dvledpveliegerelqafeniedeikligyfknkdsehykafkeaaeefhpyipffatf
>>> dskvakkltlklneidfyeafmeepvtipdkpnseeeivnfveehrrstlrklkpesmye
>>> tweddmdgihivafaeeadpdgyefleilksvaqdntdnpdlsiiwidpddfpllvpywe
>>> ktfdidlsapqigvvnvtdadsvwmemddeedlpsaeeledwledvlegeintedddded
>>> ddddddd
>>> 
>>> For PDB entry 1rb9, the hydrophilicity of this protein cannot be
>>> estimated
>>> because its sequence starts with "X", which is not in the key list
>>> used by
>>> SeqUtils. It will bring the following error message:
>>> 
>>> Traceback (most recent call last):
>>>  File "./dataGen.py", line 62, in ?
>>>    aHydrophilicityList = aSeqObj.protein_scale(ProtParamData.hw, 5)
>>>  File "/usr/lib/python2.4/site-packages/Bio/SeqUtils/ProtParam.py", line
>>> 206, in protein_scale
>>>    score += weight[j] * ParamDict[subsequence[j]] + weight[j] *
>>> ParamDict[subsequence[Window-j-1]]
>>> KeyError: 'X'
>>> 
>>> Although I can delete the "X" in this protein, could the author
>>> implement a
>>> warning message and work around this error stop? Thank you.
>>> 
>>> Bin
>>> 
>>> _______________________________________________
>>> BioPython mailing list  -  BioPython at biopython.org
>>> http://biopython.org/mailman/listinfo/biopython
>>> 
>>> 
>>>  
>>> 
>> 
>> 
> 




More information about the Biopython-dev mailing list