[Bioperl-l] Bioperl-run: Testing alignments generated externally

Thu Oct 26 16:57:32 UTC 2006

Nathan -

I agree - the values tend to change with different versions of the  
applications unfortunately.  It would make sense to just test that  
you get out sequences that are in valid alignment format and perhaps  
have as many ending sequences as you started with.   The more  
restrictive tests probably aren't reliable with mixing and matching  
versions.

One thing we do for PAML is condition tests on the version used - but  
of course when a new version comes out we have to add more stuff to  
the tests (or just have some code that skips those tests).

-jason
On Oct 26, 2006, at 3:33 AM, Nathan Haigh wrote:

> Remo Sanges wrote:
>> Nathan Haigh wrote:
>>> Sendu Bala wrote:
>>>
>>>> Nathan Haigh wrote:
>>>>
>>>>> I'm thinking that it's not wise to test for things like
>>>>> overall_percentage_identity etc in alignments that are  
>>>>> generated by
>>>>> external software like T-Coffee, Clustalw etc. Changes to software
>>>>> algorithms/efficiency, bug fixes etc may well alter the quality  
>>>>> of the
>>>>> alignment produced in different versions and thus affect the value
>>>>> returned by such methods. Therefore, I think these methods  
>>>>> should only
>>>>> be tested from alignments loaded directly from t/data.
>>>>>
>>>> Did you discover some specific problem cases?
>>>>
>>> My messages seem to be taking a while to come through, but, yes.  
>>> It may
>>> be due to the software changing default parameters, but it makes  
>>> testing
>>> the output for specific details pretty difficult and  
>>> inconsistent. For
>>> example, running T-Coffee, the following command from t/TCoffee.t
>>> results in slightly different alignment:
>>> $aln = $factory->run('-type' => 'profile',
>>>                      '-profile' => $aln1,
>>>                      '-seq'  =>
>>> Bio::Root::IO->catfile("t","data","cysprot1b.fa"));
>>>
>>> Of particular note, is the gaps on the last line of the  
>>> sequences. In
>>> 4.45, there are two gaps in CATH_RAT/1-133 ('gk-nm---cg') whereas in
>>> <v4.45 this is ('gkn----mcg').
>>>
>> I'm not a T-coffee user but usually you can come across
>> these problems when you use different scoring parameters
>> when align sequences.
>>
>> Could it be possible that they have simply changed the
>> default parameters for gap penalties and that kind of
>> stuff? It is possible to set them?
>>
>> If so you can just run the test by defining
>> the scores in the param hash without using the default.
>>
>> HTH
>>
>> Remo
> That is true, but it depends on the whether the wrapper is complete
> enough to be able to set all the parameters provided by the software.
>
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich, PhD
Miller Research Fellow
University of California
Dept of Plant and Microbial Biology
321 Koshland Hall #3102
Berkeley, CA 94720-3102
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html