<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<tt>Thanks for information, Albert.<br>

<br>

But still in two questions:</tt><br>

Albert Vilella wrote:

<blockquote

 cite="mid358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com"

 type="cite">codeml in PAML can give different results in cases where

the optimization reaches different local maxima depending on the

different starting points of each run (seed values). So, at least for

some methods and options, this instability is inherent to the

underlying algorithm.

  <br>

</blockquote>

<tt>1. How to set the initial value in order to get a reasonable

estimation? Do you have some experience for that?</tt>

<blockquote

 cite="mid358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com"

 type="cite">Even more, for some methods and options, it is even

recommended in PAML documentation to run the same data more than once,

to see if the results are the same, which would be a good indication

that the model is robust given the data.

  <br>

</blockquote>

<tt>2. Is there a recommend way to test the significance if the results

are different? For example, in my case, dS could range from 10.1852 to

14.9961 for the four runtime. If that means the model is not robust(how

to check this?), should I change to use another model?<br>

<br>

How </tt><tt>could </tt><tt>YN00 reach stable result? (Is it because

YN00 does not require initial value for optimization?) Why could YN00

produce so different result from Codeml? (for YN00, dS=</tt><tt>2.1300

with SE=1.2272; for Codeml, dS=10.1852-14.9961)</tt><br>

<blockquote

 cite="mid358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com"

 type="cite">Maybe PAML's author can give a more specific answer for

your data at:<br>

  <a href="http://www.rannala.org/gsf/viewforum.php?f=1">http://www.rannala.org/gsf/viewforum.php?f=1</a><br>

</blockquote>

<tt><br>

Actually I already post my question in the author's forum. Let's wait

and see.</tt>

<blockquote

 cite="mid358f4d650705290602u605ff04fr226e12512a19a13e@mail.gmail.com"

 type="cite"><br>

Cheers,<br>

  <br>

&nbsp;&nbsp;&nbsp; Albert.

  <br>

  <br>

  <div><span class="gmail_quote">On 5/29/07, <b

 class="gmail_sendername">Dong Xianjun</b> &lt;<a

 href="mailto:Xianjun.Dong@bccs.uib.no">Xianjun.Dong@bccs.uib.no</a>&gt;

wrote:</span>

  <blockquote class="gmail_quote"

 style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

    <div bgcolor="#ffffff" text="#000000">HI, dear all, //sorry for

duplicated msg for <i>Jason</i> and <i>Neil</i><br>

    <br>

I'm bothering by two problems when I use PAML module to calculate Ka/Ks

for my sequences. Could you help me?<br>

    <br>

1.&nbsp; Codeml could produce different Ka/Ks value if I run it twice. I

check it both in command line and in Perl wrapper of

Bio::Tools::Run::Phylo::PAML::Codeml;<br>

    <br>

The input sequences are:<br>

&gt;seq1

TCTCTCTGGCCCAAAATCCGGGTTCCATTAAAAGTTGTGAGGACTGCTGAAAACAAGTTAAGTAACCGTTTCTTCCCTTATGATGAAATCGAGACAGAAGCTGTTCTGGCCATTGATGATGATATCATTATGCTGACCTCTGACGAGCTGCAATTTGGTTATGAG<br>

&gt;seq2<br>

TCACTGTGGCCCAAAGTCGCAGTGCCTCTTAAAGTGGTCCGCACCAAAGAAAACAAGCTCAGCAATCGATTCTTTCCGTTTGATGAGATCGAGACAGAAGCTGTCCTGGCCATTGACGATGACATCATCATGTTAACCTCAGATGAGCTACAGTTTGGATATGAG<br>

    <br>

For command-line program, I used Codeml in PAML3.14, with

specifications in codeml.ctl (runmode = -2, seqtype = 1). I tried to

run the program four times.&nbsp; The output are like below (from the output

file). We could see that they are different from each other. they

should be same or slightly different. Right? But they are NOT.&nbsp; Weird!<br>

----------------------------------------------------------------------------------------------------------------------------------<br>

    <tt>t=11.5447&nbsp; S=&nbsp;&nbsp;&nbsp; 42.4&nbsp; N=&nbsp;&nbsp; 122.6&nbsp; dN/dS= 0.0035&nbsp; dN= 0.0522&nbsp;

dS=14.8339<br>

t= 9.4132&nbsp; S=&nbsp;&nbsp;&nbsp; 41.8&nbsp; N=&nbsp;&nbsp; 123.2&nbsp; dN/dS= 0.0041&nbsp; dN= 0.0507&nbsp; dS=12.2349<br>

t=11.6305&nbsp; S=&nbsp;&nbsp;&nbsp; 42.2&nbsp; N=&nbsp;&nbsp; 122.8&nbsp; dN/dS= 0.0034&nbsp; dN= 0.0510&nbsp; dS=14.9961<br>

t= 7.7879&nbsp; S=&nbsp;&nbsp;&nbsp; 41.4&nbsp; N=&nbsp;&nbsp; 123.6&nbsp; dN/dS= 0.0050&nbsp; dN= 0.0505&nbsp; dS=10.1852<br>

    </tt>----------------------------------------------------------------------------------------------------------------------------------<br>

I found the same problem when I use the Perl Wrapper of

Bio::Tools::Run::Phylo::PAML::Codeml; (I attached my Perl script here,

similar to the one in BioPerl HOWTO).<br>

    <br>

2. Another strange thing is, if I switch to use program YN00 in the

package of PAML, the output are stable. However, it's much different

from Codeml. (see below) <br>

----------------------------------------------------------------------------------------------------------------------------------<br>

    <tt>seq. seq.&nbsp;&nbsp;&nbsp;&nbsp; S&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; N&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; t&nbsp;&nbsp; kappa&nbsp;&nbsp;&nbsp; omega&nbsp; &nbsp; &nbsp; dN +-

SE&nbsp;&nbsp; &nbsp;

&nbsp; &nbsp;&nbsp; dS +- SE<br>

&nbsp;&nbsp; 2&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp; 40.4&nbsp;&nbsp; 124.6&nbsp;&nbsp; 1.7452&nbsp; 1.3163&nbsp; 0.0378 0.0804 +- 0.0265&nbsp;

2.1300 +- 1.2272</tt><br>

----------------------------------------------------------------------------------------------------------------------------------<br>

Why like this? Which one I should believe?<br>

    <br>

    <br>

Is there any guy who would kindly help me to run the perl script (twice

to check whether they are different)? or help to run the codeml in

command line?<br>

I don't know whether there is anyone noticed this before, or because of

the wrong version of PAML.<br>

    <br>

Regards,<br>

    <br>

Xianjun<br>

    <br>

    <br>

    <br>

Himanshu Ardawatia wrote:

    <blockquote

 cite="http://mid62d36e2b0705290158h1c85362cp824778ca5ecc8645@mail.gmail.com"

 type="cite">#!/usr/bin/perl<br>

      <br>

use strict;<br>

use warnings;<br>

      <br>

      <br>

use Bio::Tools::Run::Phylo::PAML::Codeml;<br>

use Bio::Tools::Run::Alignment::Clustalw;<br>

      <br>

# for projecting alignments from protein to R/DNA space<br>

use Bio::Align::Utilities qw(aa_to_dna_aln); <br>

      <br>

# for input of the sequence data<br>

use Bio::SeqIO;<br>

use Bio::AlignIO;<br>

      <br>

my $aln_factory = new Bio::Tools::Run::Alignment::Clustalw();<br>

      <br>

#my $seqdata = 'chuck.fa';<br>

my $seqdata = 'xianjun.fa

';<br>

      <br>

my $seqIO = new Bio::SeqIO(-file&nbsp;&nbsp; =&gt; $seqdata,<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -format =&gt; 'fasta');<br>

my %seqs;<br>

my @prots;<br>

      <br>

my $output;<br>

# process each sequence<br>

while( my $seq = $seqIO-&gt;next_seq ) { <br>

&nbsp;&nbsp;&nbsp; $seqs{$seq-&gt;display_id} = $seq;<br>

&nbsp;&nbsp;&nbsp; # translate them into protein<br>

&nbsp;&nbsp;&nbsp; my $protein = $seq-&gt;translate();<br>

&nbsp;&nbsp;&nbsp; my $pseq = $protein-&gt;seq();<br>

&nbsp;&nbsp;&nbsp; if( $pseq =~ /\*/ &amp;&amp; <br>

&nbsp;&nbsp;&nbsp; $pseq !~ /\*$/ ) { <br>

&nbsp;&nbsp;&nbsp; warn("provided a cDNA sequence with a stop codon, PAML will

choke!");<br>

&nbsp;&nbsp;&nbsp; exit(0);<br>

&nbsp;&nbsp;&nbsp; }<br>

&nbsp;&nbsp;&nbsp; # Tcoffee can't handle '*' even if it is trailing<br>

&nbsp;&nbsp;&nbsp; $pseq =~ s/\*//g;<br>

&nbsp;&nbsp;&nbsp; $protein-&gt;seq($pseq); <br>

&nbsp;&nbsp;&nbsp; push @prots, $protein;<br>

}<br>

      <br>

if( @prots &lt; 2 ) {<br>

&nbsp;&nbsp;&nbsp; warn("Need at least 2 cDNA sequences to proceed");<br>

&nbsp;&nbsp;&nbsp; exit(0);<br>

}<br>

      <br>

open(OUT, "&gt;align_output.txt") ||<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; die("cannot open output $output for writing"); <br>

# Align the sequences with clustalw<br>

      <br>

my $aa_aln = $aln_factory-&gt;align(\@prots);<br>

      <br>

# project the protein alignment back to cDNA coordinates<br>

my $dna_aln = &amp;aa_to_dna_aln($aa_aln, \%seqs);<br>

      <br>

my @each = $dna_aln-&gt;each_seq(); <br>

      <br>

my $kaks_factory = new Bio::Tools::Run::Phylo::PAML::Codeml<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ( -params =&gt; { 'runmode' =&gt; -2,<br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'seqtype' =&gt; 1,<br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;'model' =&gt; 1, <br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; }<br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; );<br>

      <br>

# set the alignment object<br>

$kaks_factory-&gt;alignment($dna_aln);<br>

      <br>

# run the KaKs analysis<br>

my ($rc,$parser) = $kaks_factory-&gt;run();<br>

my $result = $parser-&gt;next_result; <br>

my $MLmatrix = $result-&gt;get_MLmatrix();<br>

      <br>

my @otus = $result-&gt;get_seqs();<br>

# this gives us a mapping from the PAML order of sequences back to <br>

# the input order (since names get truncated)<br>

my @pos = map { <br>

&nbsp;&nbsp;&nbsp; my $c= 1;<br>

&nbsp;&nbsp;&nbsp; foreach my $s ( @each ) {<br>

&nbsp;&nbsp;&nbsp; last if( $s-&gt;display_id eq $_-&gt;display_id );<br>

&nbsp;&nbsp;&nbsp; $c++;<br>

&nbsp;&nbsp;&nbsp; }<br>

&nbsp;&nbsp;&nbsp; $c; <br>

} @otus; <br>

      <br>

print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID

CDNA_PERCENTID)), "\n"; <br>

for( my $i = 0; $i &lt; (scalar @otus -1) ; $i++) {<br>

&nbsp;&nbsp;&nbsp; for( my $j = $i+1; $j &lt; (scalar @otus); $j++ ) {<br>

&nbsp;&nbsp;&nbsp; my $sub_aa_aln&nbsp; = $aa_aln-&gt;select_noncont($pos[$i],$pos[$j]);<br>

&nbsp;&nbsp;&nbsp; my $sub_dna_aln = $dna_aln-&gt;select_noncont($pos[$i],$pos[$j]); <br>

&nbsp;&nbsp;&nbsp; print OUT join("\t",&nbsp; <br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $otus[$i]-&gt;display_id,<br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

$otus[$j]-&gt;display_id,$MLmatrix-&gt;[$i]-&gt;[$j]-&gt;{'dN'},<br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $MLmatrix-&gt;[$i]-&gt;[$j]-&gt;{'dS'}, <br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $MLmatrix-&gt;[$i]-&gt;[$j]-&gt;{'omega'},<br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sprintf("%.2f",$sub_aa_aln-&gt;percentage_identity),<br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sprintf("%.2f",$sub_dna_aln-&gt;percentage_identity), <br>

&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ), "\n";<br>

&nbsp;&nbsp;&nbsp; }<br>

}<br>

      <br>

      <br>

      <div><span class="gmail_quote">On 5/29/07, <b

 class="gmail_sendername">Himanshu Ardawatia</b> &lt;<a

 href="mailto:himanshu.ardawatia@bccs.uib.no" target="_blank"

 onclick="return top.js.OpenExtLink(window,event,this)">himanshu.ardawatia@bccs.uib.no

      </a>&gt; wrote:</span>

      <blockquote class="gmail_quote"

 style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi

Xianjun,<br>

        <br>

I recognize this script. But it was a bit cumbersom to use this as many

things are done in the script (like multiple alignment, aa to dna

alignment and ka/ks calculation) so one does not have real control on

these different aspect. <br>

I do not remeber getting different Ka/Ks in different runs though. But

I remeber that one I ran the script with different versions of clustalw

and it REALLY gave different results !! So please make sure if the

clustalw versions are the same in all your runs. Best is to use the

latest version. <br>

        <br>

Finally I wrote my simple script which would generate a codeml.ctl file

for each set of sequences and run the codeml based on that and then

more on. Disadvantage of this can be that some files keep getting

over-written (like the one which have their names hard-coded in codeml

program) and if one needs those files as well then one needs to run the

codeml cycles for each set of sequences in different directories. <br>

        <br>

One advantage of this kind of script is that you can use whichever

alignment program you want to use and so on....But then its also extra

steps of yourself doing multiple alignment and aa to dna alignment

etc.... <br>

        <br>

Does it make sense? If you still get different outputs with same

version of clustalw then I can sit with you and look at things

together. Or else try the script method which I mentioned.<br>

        <br>

Cheers&nbsp; and Fu<br>

Himanshu <br>

\\

        <div><span><br>

        <div><span class="gmail_quote">On 5/28/07, <b

 class="gmail_sendername">Dong Xianjun</b> &lt;<a

 href="mailto:Xianjun.Dong@bccs.uib.no" target="_blank"

 onclick="return top.js.OpenExtLink(window,event,this)">

Xianjun.Dong@bccs.uib.no</a>&gt; wrote:</span>

        <blockquote class="gmail_quote"

 style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">HI,

Himanshu<br>

          <br>

I am sure you did some work in Ka/Ks calculation. Here I have a question<br>

bothering me; the output for Bio::Tools::Run::Phylo::PAML::Codeml is not<br>

stable(different for each runtime), and also different from the output <br>

with modeul of Bio::Tools::Run::Phylo::PAML::Yn00.<br>

          <br>

Here I attached the script. Could you help to have a look and try to run<br>

the script? How is your way to calculate the Kaks ratio?<br>

          <br>

Thanks<br>

          <br>

--<br>

---------------------------<br>

Sterding (Xianjun) Dong<br>

PhD student, Boris Lenhard's group<br>

Bergen Center of Computational Science<br>

Bergen University, Norway<br>

Mobile: 0047-47361688<br>

Telephone: 0047-55276381 <br>

Skype: xianjun.dong<br>

          <br>

          <br>

        </blockquote>

        </div>

        <br>

        </span></div>

      </blockquote>

      </div>

      <br>

    </blockquote>

    <br>

    <pre cols="72">-- 

---------------------------

Sterding (Xianjun) Dong

PhD student, Boris Lenhard's group

Bergen Center of Computational Science

Bergen University, Norway

Mobile: 0047-47361688

Telephone: 0047-55276381

Skype: xianjun.dong

    </pre>

    </div>

    <br>

_______________________________________________<br>

Bioperl-l mailing list<br>

    <a onclick="return top.js.OpenExtLink(window,event,this)"

 href="mailto:Bioperl-l@lists.open-bio.org">Bioperl-l@lists.open-bio.org</a><br>

    <a onclick="return top.js.OpenExtLink(window,event,this)"

 href="http://lists.open-bio.org/mailman/listinfo/bioperl-l"

 target="_blank">http://lists.open-bio.org/mailman/listinfo/bioperl-l</a><br>

    <br clear="all">

  </blockquote>

  </div>

  <br>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

---------------------------

Sterding (Xianjun) Dong

PhD student, Boris Lenhard's group

Bergen Center of Computational Science

Bergen University, Norway

Mobile: 0047-47361688

Telephone: 0047-55276381

Skype: xianjun.dong

</pre>

</body>

</html>