[Bioperl-l] Quickest Codon Based MSA?

Johan Nilsson johan.nilsson at sh.se
Thu Jan 24 22:33:42 UTC 2008


Hello,

I have a question which might not necessarily be related to Bioperl, 
although I do believe the expertise is available here. I have a couple 
of thousand FASTA files, each containing 20 CDS sequence orthologues of 
rather high sequence similarity. I would like to create a codon-based 
multiple sequence alignment for each of these FASTA files (i.e. a 
nucleotide sequence alignment inferred from alignment of the translated 
peptide sequences, to assure that no frame shifts will occur). I first 
tried running Dialign2, which can perform the 
translation/back-translation in one go, but this turned out to be far 
too slow. I next tried to build protein alignments using ClustalW and 
subsequently built the coding region alignment using EMBOSS 'tranalign', 
but this also was too slow.

Is there any method available which significantly speeds up the 
codon-preserving alignment??? As I mentioned, the sequences to be 
aligned are in general very conserved, so any heuristic taking advantage 
of the low divergence would be very helpful! Also, is there any 
adjustable parameter in dialign2/dialign-T that might speed up the 
program when looking at highly similar sequences?

Best regards
/Johan Nilsson



More information about the Bioperl-l mailing list