<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.0.6603.0">
<TITLE>Needle/water, revcomp</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->
<P><FONT SIZE=2 FACE="Courier New">I have 2 questions:</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">The first is about identity/similarity in nucleotide alignments made with needle (probably the same holds true for water):</FONT></P>
<P><FONT SIZE=2 FACE="Courier New"> </FONT>
<BR><FONT SIZE=2 FACE="Courier New">########################################</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Program: needle</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Rundate: Thu Jun 23 13:29:58 2005</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Align_format: srspair</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Report_file: seq0.needle</FONT>
<BR><FONT SIZE=2 FACE="Courier New">########################################</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">#=======================================</FONT>
<BR><FONT SIZE=2 FACE="Courier New">#</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Aligned_sequences: 2</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># 1: SEQ0</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># 2: SEQ1</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Matrix: EDNAFULL</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Gap_penalty: 100.0</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Extend_penalty: 10.0</FONT>
<BR><FONT SIZE=2 FACE="Courier New">#</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Length: 70</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Length of sequence 1: 70</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Length of sequence 2: 70</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Identity: 46/70 (65.7%)</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Similarity: 47/70 (67.1%)</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Gaps: 0/70 ( 0.0%)</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Score: 162.0</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># </FONT>
<BR><FONT SIZE=2 FACE="Courier New">#</FONT>
<BR><FONT SIZE=2 FACE="Courier New">#=======================================</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New"> . . . . .</FONT>
<BR><FONT SIZE=2 FACE="Courier New">SEQ0 1 aaaaaaaaaaaaaaaaaaaaaaaaacccccgggggtttttuuuuunnnnn 50</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> |||||||||||||||||||||......|......||:....:|.. </FONT>
<BR><FONT SIZE=2 FACE="Courier New">SEQ1 1 aaaaaaaaaaaaaaaaaaaaacgtunacgtunacgtunacgtunacgtun 50</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> . . . . .</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New"> . .</FONT>
<BR><FONT SIZE=2 FACE="Courier New">SEQ0 51 aaaaaaaaaaaaaaaaaaaa 70</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> ||||||||||||||||||||</FONT>
<BR><FONT SIZE=2 FACE="Courier New">SEQ1 51 aaaaaaaaaaaaaaaaaaaa 70</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> . .</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">Each base of the set acgtun is aligned against each other. The 20 a's at the beginning and end are only to force an ungapped alignment. Maximum gap penalties were used.</FONT></P>
<P><FONT SIZE=2 FACE="Courier New"> </FONT>
<BR><FONT SIZE=2 FACE="Courier New">I agree with the symbols in the alignment |,: and ., but the 46 identities in the summary imply that the n-n match is also counted. The t-u matches are counted as similar, which is ok, but the n-n match is not counted as similar, although it is counted as identical. I think the n-n match should not be counted both in identity and similarity.</FONT></P>
<P><FONT SIZE=2 FACE="Courier New"> </FONT>
<BR><FONT SIZE=2 FACE="Courier New">Now for ambiguous bases. w is a or t</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> </FONT>
<BR><FONT SIZE=2 FACE="Courier New">########################################</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Program: needle</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Rundate: Thu Jun 23 14:53:33 2005</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Align_format: srspair</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Report_file: seq0.needle</FONT>
<BR><FONT SIZE=2 FACE="Courier New">########################################</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">#=======================================</FONT>
<BR><FONT SIZE=2 FACE="Courier New">#</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Aligned_sequences: 2</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># 1: SEQ0</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># 2: SEQ1</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Matrix: EDNAFULL</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Gap_penalty: 100.0</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Extend_penalty: 10.0</FONT>
<BR><FONT SIZE=2 FACE="Courier New">#</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Length: 26</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Length of sequence 1: 26</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Length of sequence 2: 26</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Identity: 21/26 (80.8%)</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Similarity: 23/26 (88.5%)</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Gaps: 0/26 ( 0.0%)</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># Score: 94.0</FONT>
<BR><FONT SIZE=2 FACE="Courier New"># </FONT>
<BR><FONT SIZE=2 FACE="Courier New">#</FONT>
<BR><FONT SIZE=2 FACE="Courier New">#=======================================</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New"> . . </FONT>
<BR><FONT SIZE=2 FACE="Courier New">SEQ0 1 aaaaaaaaaawwwwwwaaaaaaaaaa 26</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> ||||||||||.. .||||||||||</FONT>
<BR><FONT SIZE=2 FACE="Courier New">SEQ1 1 aaaaaaaaaaatwgcuaaaaaaaaaa 26</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> . . </FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">In the alignment I would put a dot at the w-w match (but I could also agree with the way it is handled now). But again the w is counted in the summary as an identity but not as a similarity.</FONT></P>
<BR>
<BR>
<P><FONT SIZE=2 FACE="Courier New">The second question is about the handling in EMBOSS of reverse-complemented nucleotide segments such as </FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">db:seq[10:20:r]</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">The sequence is first reverse-complemented and then residues 10 to 20 are cut out.</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Biologists usually expect that residues 10 to 20 are first cut out and then reverse-complemented.</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">Can this be changed? That would be very helpful.</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">Best regards</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">Clemens</FONT>
</P>
<BR>
<P><FONT SIZE=2 FACE="Arial">Dr. Clemens Broger</FONT>
<BR><FONT SIZE=2 FACE="Arial">Bioinformatics</FONT>
<BR><FONT SIZE=2 FACE="Arial">F. Hoffmann-La Roche Ltd.</FONT>
<BR><FONT SIZE=2 FACE="Arial">PRBI 65/303</FONT>
<BR><FONT SIZE=2 FACE="Arial">CH-4070 Basel</FONT>
<BR><FONT SIZE=2 FACE="Arial">clemens.broger@roche.com</FONT>
<BR><FONT SIZE=2 FACE="Arial">+41-61-688-4447</FONT>
</P>
</BODY>
</HTML>