[Biopython] Extract the similarity after using Water in Emboss

Peter Cock p.j.a.cock at googlemail.com
Wed Feb 22 12:03:33 UTC 2017


Does this do what you need?

https://github.com/biopython/biopython/pull/692

If so, perhaps your example would make a nice test case :)

Peter


On Wed, Feb 22, 2017 at 1:16 AM, Islam Amin <eng.islamamin at gmail.com> wrote:
> Many thanks Brian for your comments, I'm just new in biopython, according to
> your guides, I wrote a script to get the similarity value and the output is
> "92.3":
>
> from Bio.Emboss.Applications import WaterCommandline
> import string
>
> water_cmd = WaterCommandline(gapopen=10, gapextend=0.5, stdout=True,
> auto=True)
> #water_cmd = WaterCommandline(gapopen=10, gapextend=0.5, aformat=True,
> stdout=True, auto=True)
> water_cmd.asequence = "asis:ACCCGGGCGCGGT"
> water_cmd.bsequence = "asis:ACCCGAGCGCGGT"
> output = water_cmd()
> output =  str(output[0])
> for line in output.splitlines():
> for word in line.split():
> if word == "Similarity:":
> similarity = float(line[line.index("(") + 1:line.rindex("%)")])
> print similarity
>
> On Wed, Feb 22, 2017 at 5:26 AM, Brian Osborne <bosborne11 at verizon.net>
> wrote:
>>
>> Islam,
>>
>> I’m looking at the Bio.Emboss code and it seems that, generally speaking,
>> this module runs EMBOSS applications but does not parse their output, which
>> is what would be required to get the “similarity” value.
>>
>> Also note that in your code example you haven’t yet run water, which you
>> would do like "water_cmd()”. But you can always parse out this similarity
>> value yourself, first by capturing the output, something like this:
>>
>> >>> water_cmd = WaterCommandline(gapopen=10, gapextend=0.5, stdout=True,
>> >>> auto=True)
>> >>> water_cmd.asequence = "asis:ACCCGGGCGCGGT"
>> >>> water_cmd.bsequence = "asis:ACCCGAGCGCGGT"
>> >>> output = water_cmd()
>> >>> output
>> ('########################################\n# Program: water\n# Rundate:
>> Tue 21 Feb 2017 13:23:38\n# Commandline: water\n#    -auto\n#    -stdout\n#
>> -asequence asis:ACCCGGGCGCGGT\n#    -bsequence asis:ACCCGAGCGCGGT\n#
>> -gapopen 10\n#    -gapextend 0.5\n# Align_format: srspair\n# Report_file:
>> stdout\n########################################\n\n#=======================================\n#\n#
>> Aligned_sequences: 2\n# 1: asis\n# 2: asis\n# Matrix: EDNAFULL\n#
>> Gap_penalty: 10.0\n# Extend_penalty: 0.5\n#\n# Length: 13\n# Identity:
>> 12/13 (92.3%)\n# Similarity:    12/13 (92.3%)\n# Gaps:           0/13 (
>> 0.0%)\n# Score: 56.0\n#
>> \n#\n#=======================================\n\nasis               1
>> ACCCGGGCGCGGT     13\n                     |||||.|||||||\nasis
>> 1 ACCCGAGCGCGGT
>> 13\n\n\n#---------------------------------------\n#---------------------------------------\n',
>> '')
>>
>> Now just use your regex.
>>
>> Brian O.
>>
>>
>>
>>
>> On Feb 19, 2017, at 7:28 PM, Islam Amin <eng.islamamin at gmail.com> wrote:
>>
>> Dear All.
>> I would like to get the similarity between two sequences, I have found
>> that there is attributes called "similarity" but I this it is boolean value,
>> is there any way to get the similarity value between two sequences instead
>> of writing a text file.
>>
>> from Bio.Emboss.Applications import WaterCommandline
>> water_cmd = WaterCommandline(gapopen=10, gapextend=0.5)
>> water_cmd.asequence = "asis:ACCCGGGCGCGGT"
>> water_cmd.bsequence = "asis:ACCCGAGCGCGGT"
>> print water_cmd.similarity
>> > None
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>>
>
>
>
> --
> Best Regards,
> Islam Amin.
>
> www.egyptscience.net
> Scientific Research Group in EGYPT
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list