[BioPython] Protparam using BioPython
Peter
biopython at maubp.freeserve.co.uk
Fri Apr 27 09:55:42 UTC 2007
Shameer Khadar wrote:
> Dear Peter,
>
> Thanks for your reply.
Sorry for the delay - I was away on a course this week.
> I was looking for a script based on Bio.SeqUtils.
> I got the following script from a website, its working perfect for me. But
> the problem is i have around 1000 sequence (in raw format without headers)
> and i thought to process it using a foreach equivalent in python(I am a
> python newbie). But its only a couple of minutes back i came to know that
> there is no foreach in python, but some better alternative is available
> !!!.
There is a "for each" equivalent in python!
http://docs.python.org/tut/node6.html
If you don't have a good introductory python book, that online tutorial
is an excellent starting point.
> It will be great if you can help to process my file using this
> program.
>
> program :
> from Bio.SeqUtils import ProtParam, ProtParamData
> def PrintDictionary(MyDict):
> for i in MyDict.keys():
> print "%s\t%.2f" %(i, MyDict[i])
> print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL"
> X = ProtParam.ProteinAnalysis("")
> print "Instability index of test protein: %.2f" % X.instability_index()
It seems like you have only given bits of a program, so I have tried to
guess what you meant.
> first few lines of my file :
> AEGEFAHLYGTFRED
> AEGEFAHLZGTFRED
> AEGEFGATYGVYTSD
> AEGEFGATZGVYTSD
> AEGEFGATYGVZTSD
> AEGEFGATZGVZTSD
> AEGEFLYGEIQGTQD
In the following example, I am assuming your sequences are in a plain
text file, called protparam.txt, which contains each sequence on a
single line.
Try something like this first of all, and make sure that it prints out
your sequences correctly:
for line in open("protparam.txt") :
#Remove any trailing new lines or white space
seq_string = line.rstrip()
print "Sequence <%s>" % seq_string
Then try doing the ProtParam.ProteinAnalysis of each sequence string:
from Bio.SeqUtils import ProtParam, ProtParamData
for line in open("protparam.txt") :
#Remove any trailing new lines or white space
seq_string = line.rstrip()
print "Sequence <%s>" % seq_string
X = ProtParam.ProteinAnalysis(seq_string)
print "Instability index: %.2f" % X.instability_index()
You'll find it doesn't like the "Z" (presumably this is Glx - glutamic
acid or glutamine? i.e. E or Q) present in many of your sequences, so
this next version uses error handling to note this and then carry on to
the next sequence:
from Bio.SeqUtils import ProtParam, ProtParamData
for line in open("protparam.txt") :
#Remove any trailing new lines or white space
seq_string = line.rstrip()
print #blank line
print "Sequence <%s>" % seq_string
X = ProtParam.ProteinAnalysis(seq_string)
try :
print "Instability index: %.2f" % X.instability_index()
except KeyError, e :
print "Problem with the letter %s in the sequence?" % str(e)
The output is:
Sequence <AEGEFAHLYGTFRED>
Instability index: 8.39
Sequence <AEGEFAHLZGTFRED>
Problem with the letter 'Z' in the sequence?
Sequence <AEGEFGATYGVYTSD>
Instability index: -17.70
Sequence <AEGEFGATZGVYTSD>
Problem with the letter 'Z' in the sequence?
Sequence <AEGEFGATYGVZTSD>
Problem with the letter 'Z' in the sequence?
Sequence <AEGEFGATZGVZTSD>
Problem with the letter 'Z' in the sequence?
Sequence <AEGEFLYGEIQGTQD>
Instability index: 8.61
You'll have to check yourself to see if these numbers are sensible. I
don't know what to suggest for your "Z" entries - the stability will be
different if you try using E or Q instead.
Peter
More information about the Biopython
mailing list