[Bioperl-l] Quickest Codon Based MSA?

Fri Jan 25 10:55:50 UTC 2008

Johan Nilsson wrote:
> Hello,
> 
> I have a question which might not necessarily be related to Bioperl, 
> although I do believe the expertise is available here. I have a couple 
> of thousand FASTA files, each containing 20 CDS sequence orthologues of 
> rather high sequence similarity. I would like to create a codon-based 
> multiple sequence alignment for each of these FASTA files (i.e. a 
> nucleotide sequence alignment inferred from alignment of the translated 
> peptide sequences, to assure that no frame shifts will occur). I first 
> tried running Dialign2, which can perform the 
> translation/back-translation in one go, but this turned out to be far 
> too slow. I next tried to build protein alignments using ClustalW and 
> subsequently built the coding region alignment using EMBOSS 'tranalign', 
> but this also was too slow.
> 
> Is there any method available which significantly speeds up the 
> codon-preserving alignment??? As I mentioned, the sequences to be 
> aligned are in general very conserved, so any heuristic taking advantage 
> of the low divergence would be very helpful! Also, is there any 
> adjustable parameter in dialign2/dialign-T that might speed up the 
> program when looking at highly similar sequences?

Do you know which is the slow part? For example, when using ClustalW, 
are the alignments slower than the creating the codon alignment from the 
protein?

If ClustalW is the problem, you can try using other alignment programs 
famous for their speed, such as Muscle. If it's the protein->codon bit 
that's slow, try using other programs to do that, like Pal2Nal or the 
BioPerl method.