[EMBOSS] problems with allversusall tool
Daniel Barker
db60 at st-andrews.ac.uk
Wed Jul 9 11:45:34 UTC 2008
Dear Laura,
Which EMBOSS program are you using? I don't find this effect with EMBOSS
needle:
$ cat seq_a.fa
>seq_a
MGQMQIV
$ cat seq_b.fa
>seq_b
IV
$ needle
Needleman-Wunsch global alignment.
Input sequence: seq_a.fa
Second sequence(s): seq_b.fa
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [seq_a.needle]:
$ cat seq_a.needle
########################################
# Program: needle
# Rundate: Wed 9 Jul 2008 12:34:46
# Commandline: needle
# -asequence seq_a.fa
# -bsequence seq_b.fa
# Align_format: srspair
# Report_file: seq_a.needle
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: seq_a
# 2: seq_b
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 7
# Identity: 2/7 (28.6%)
# Similarity: 2/7 (28.6%)
# Gaps: 5/7 (71.4%)
# Score: 8.0
#
#
#=======================================
seq_a 1 MGQMQIV 7
||
seq_b 1 -----IV 2
#---------------------------------------
#---------------------------------------
I'm not sure it's relevant to your question but note that, in EMBOSS
needle, the score is unaffected by "hanging ends". I consider this odd,
in fact not really a global alignment score. E.g. a protein with domain
architecture -a-b-c-d- would get approx. the same score if aligned
against a protein of domain architecture -c-d-, as it would when aligned
against a protein of domain architecture -c-d-e-f-g-h-i-j-k-l-m-. In my
view this goes against the spirit of global alignment - but this
approach is briefly justified in the needle documentation, and I believe
is not unusual for global"alignment programs. Here's what I mean:
$ cat seq_c.fa
>seq_c
IVPPLKP
bhmac-db60-2:~ db60$ needle
Needleman-Wunsch global alignment.
Input sequence: seq_a.fa
Second sequence(s): seq_c.fa
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [seq_a.needle]:
$ cat seq_a.needle
########################################
# Program: needle
# Rundate: Wed 9 Jul 2008 12:37:01
# Commandline: needle
# -asequence seq_a.fa
# -bsequence seq_c.fa
# Align_format: srspair
# Report_file: seq_a.needle
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: seq_a
# 2: seq_c
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 12
# Identity: 2/12 (16.7%)
# Similarity: 2/12 (16.7%)
# Gaps: 10/12 (83.3%)
# Score: 8.0
#
#
#=======================================
seq_a 1 MGQMQIV----- 7
||
seq_c 1 -----IVPPLKP 7
#---------------------------------------
#---------------------------------------
Note that identity, similarity and gaps have all changed but score
remains the same as when seq_a and seq_b were aligned, since the only
difference is a "hanging end".
Best regards,
Daniel
laura wrote:
> Dear emboss users,
>
> I am using allversus all tool for global sequence alignment. I am writing
> to you because I am obtaining perfect aligments between sequences that have
> a very different length.. for example if I have a 100 residues protein
> sequence and a 2 residues protein sequence I obtain a 100% identity when I
> perform the alignment, in which I would expect a very poor sequence
> identity. Is there any way to prevent it or it is a posible bug in the
> program??
>
> I would thank you to answer me as soon as possible,
>
> Regards,
>
> Laura.
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
--
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532
More information about the EMBOSS
mailing list