[Bioperl-l] model 3 on Codeml.pm

Lorenzo Carretero Paulet locarpau at upvnet.upv.es
Wed Jun 8 20:21:08 UTC 2011


Jason,
This is the exact error message:

--------------------- WARNING ---------------------
MSG: parameter model specified value 3 is not recognized, please see the 
documentation and the code for this module or set the no_param_checks to 
a true value
---------------------------------------------------

See below a version of the script I'm trying to run. It runs well for 
model values 0,1 and 2 and, but retruns taht message with 3 (even after 
editing codeml.pm). It is within a subroutine to which I pass aa and nt 
sequences files as well as the tree file. Actually I'm only interested 
in the resulting  lnL of the data to do LRT comparisons. I ran PAML 
manually and, with the same datasets, the lnL values for model 3 are not 
the same as those returned when ran from the script.

Thanks,
Lorenzo

PS:

my %hashNTseqs = ();

my $dna_aln;

my $pep_aln;

my $lnL;

my $seq_ids;

#Create a Bio::SeqIO object with the pep sequences

my $inseq_pep = Bio::SeqIO->new(-file =>  "<$sequencesfilenameAA",

                             -format =>  $format );

# Remove the stop (*) from the end of every protein sequence

     while (my $seqin = $inseq_pep->next_seq)

         {

             my $seq = $seqin->seq();

             $seq =~ s/\*//g;

            }

#Create a Bio::SeqIO object with the nt sequences

my $inseq_nt = Bio::SeqIO->new(-file =>  "<$sequencesfilenameNT",

                             -format =>  $format );

     

#Get an alignment of protein sequences

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new ();

#my $aln_factory = Bio::Tools::Run::Alignment::Muscle->new ();

#say "Aligning peptide sequences globally (clustalw) ...";

$pep_aln = $aln_factory->align($sequencesfilenameAA);

while (my $seqin = $inseq_nt->next_seq)

         {

         my $seq = $seqin->seq();

         # Replace all Xs and missing characters (?) with Ns

         $seq =~ s/X/N/gi;

         $seq =~ s/\?/N/g;

         my $seq_id = $seqin->display_id();

         # Create a reference to a hash with keys : display_ids for the aa sequences in the alignment

         # and the values are a Bio::PrimarySeqI object for the corresponding spliced cDNA sequence.

         $hashNTseqs{$seq_id} = $seqin;

         }

#Generating codon alignment on the basis of the corresponding peptide one

#say "Aligning codon sequences according to corresponding peptide MSA ...";

$dna_aln = aa_to_dna_aln($pep_aln, \%hashNTseqs);

#Remove all positions containing a codon gap from the codon alignment???

#say "positions: ",$dna_aln->num_residues();

$dna_aln = $dna_aln->  remove_columns(['gaps']);

#say "positions: ",$dna_aln->num_residues();

#Generating tree from input string of ids

my $io = IO::String->new($tree);

my $biotreeio = Bio::TreeIO->new(-fh =>  $io,

                             -format =>  'newick');

my $biotree = $biotreeio->next_tree;

my $codeml_factory = new Bio::Tools::Run::Phylo::PAML::Codeml

                         (

                                 -alignment =>  $dna_aln,

                                 -tree =>     $biotree,

                                 -params =>       {

                                         #'verbose' =>  0,

                                         #'noisy' =>  9,

                                         'runmode' =>  0, #user tree

                                         'seqtype' =>  1,

                                         'model' =>  3,

                                         'NSsites' =>  3,

                                         'fix_omega' =>  0,

                                         'omega' =>  0,

                                         'ncatG' =>  2,

                                         #'icode' =>  0,

                                         #'fix_alpha' =>  0,

                                         #'fix_kappa' =>  0,

                                         #'RateAncestor' =>  0,

                                         'CodonFreq' =>  2,

                                         'cleandata' =>  1,

                                         'ndata' =>  1

                                         },

                         );

                         

#$codeml_factory->alignment($dna_aln);

#$codeml_factory->tree($biotree);

#$codeml_factory->set_parameter('model',2);

my ($rc,$parser) = $codeml_factory->run(); # or run($dna_aln,$biotree)

$codeml_factory->cleanup();

my $result = $parser->next_result();

#my $MLmatrix_free = $result_free->get_MLmatrix();

my $intree = $result->next_tree;

     $lnL = $intree->score;

     say "lnL=$lnL";

         for my $node ( $intree->get_nodes )

         {

         my $id;

     # first we do some work to figure out what the ID should be.

     # for a leaf or tip node this is just the taxon label

             if( $node->is_Leaf() )

             {

             $id = $node->id;

             }

             else

             {

             # for the internal nodes it is just the name of all the sub-nodes

             # put together, much like how Sanderson represents internal nodes

             # in r8s

             $id = "(".join(",", map { $_->id } grep { $_->is_Leaf }$node->get_all_Descendents) .")";

             }

             if( ! $node->ancestor or ! $node->has_tag('t') )

             {

             # skip when no values have been associated with this node

             # (like the root node)

             next;

             }

             print join ("\t",$id,map { ($node->get_tag_values($_))[0] }qw(dN/dS)), "\n";

         }

$result->reset_seqs;


  El 08/06/11 22:04, Jason Stajich escribió:
> yes - as I said, a simple script that demonstrates the problem will make this all easier to debug. If you reported the actual error message that might be a good start.  The msg before refers to a different parameter so I'm suspicious.
>
> On Jun 8, 2011, at 2:10 PM, Lorenzo Carretero Paulet wrote:
>
>> Jason,
>> I edited the Codeml.pm and posted the version in github. The only change I made was in line 275  'model'    =>  [0..2,7] was changed to 'model'    =>  [0..3,7] , so that 3 could be passed as valid value to model. However, I still got the same warning message, and I tested with different alignments and results are different from those obtained running PAML locally. Maybe there are additional changes to be made.
>> Lorenzo
>>
>> El 08/06/11 07:15, Jason Stajich escribió:
>>> it is in github so you can fork a version, make the change, and submit a patch which we can pick up.
>>>
>>> I am concerned that this change can't be tested without example code.
>>>
>>> Did you just edit the code and make sure your changes worked, the error message seems to refer to a different parameter
>>> (ncatG) not model.
>>>
>>> Thanks,
>>>
>>>> MSG: parameter ncatG specified value  is not recognized, please see the documentation and the code for this module or set the no_param_checks to a true value.'
>>> On Jun 7, 2011, at 9:26 PM, Lorenzo Carretero wrote:
>>>
>>>> Hi,
>>>> I'm trying to run the clade model D (Model D: model = 3, NSsites = 3 ncatG = 2, See reference. Bielawski, J. P., and Z. Yang. 2004. A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. Journal of Molecular Evolution 59:121-132. and PAML 4.4 manual page 30) from the Bio::Tools::Run::Phylo::PAML::Codeml module. However, 3 is not among the valid values that can be passed to the module (line 275 'model'    =>   [0..2,7],)  and consequently the following Warning message is returned from lines 689-690:
>>>> 'MSG: parameter ncatG specified value  is not recognized, please see the documentation and the code for this module or set the no_param_checks to a true value.'
>>>> Can line 275: 'model'    =>   [0..2,7], be changed to 'model'    =>   [0..3,7], to accept value 3 or additional changes must be done in other modules to properly run the so-called clade models of PAML.
>>>> Thanks,
>>>> Lorenzo
>>>>
>>>>
>>>> -- 
>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>>>> Lorenzo Carretero Paulet
>>>> Institute for Plant Molecular and Cell Biology - IBMCP (CSIC-UPV)
>>>> Integrative Systems Biology Group
>>>> C/ Ingeniero Fausto Elio s/n.
>>>> 46022 Valencia, Spain
>>>>
>>>> Phone:  +34 963879934
>>>> Fax:    +34 963877859
>>>> e-mail: locarpau at upvnet.upv.es
>>>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>> Lorenzo Carretero Paulet
>> Institute for Plant Molecular and Cell Biology - IBMCP (CSIC-UPV)
>> Integrative Systems Biology Group
>> C/ Ingeniero Fausto Elio s/n.
>> 46022 Valencia, Spain
>>
>> Phone:  +34 963879934
>> Fax:    +34 963877859
>> e-mail: locarpau at upvnet.upv.es
>> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Lorenzo Carretero Paulet
Institute for Plant Molecular and Cell Biology - IBMCP (CSIC-UPV)
Integrative Systems Biology Group
C/ Ingeniero Fausto Elio s/n.
46022 Valencia, Spain

Phone:  +34 963879934
Fax:    +34 963877859
e-mail: locarpau at upvnet.upv.es
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*




More information about the Bioperl-l mailing list