[BioRuby] Translate ambiguous sequence
Tomoaki NISHIYAMA
tomoakin at kenroku.kanazawa-u.ac.jp
Thu Sep 11 01:51:43 UTC 2008
Hi,
Bioruby's translate any codon containing ambiguity code to unknown or
"X".
However, sometimes, it is desirable to translate
into a fixed amino acid when it is possible.
tty -> "F"
seeing the core implementation being
naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown}
changing unknown to ct.translate_ambiguity(codon, unknown)
will not hurt the performance for sequence without ambiguity,
and trying to resolve degenerate codons is worth to do.
Also, the sequence in GenBank is usually translated as such.
What do you think?
diff -ru bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/
bio/data/codontable.rb bioruby-c/lib/bio/data/codontable.rb
--- bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/bio/
data/codontable.rb 2008-09-03 22:24:39.000000000 +0900
+++ bioruby-c/lib/bio/data/codontable.rb 2008-09-11
09:49:23.000000000 +0900
@@ -93,6 +93,23 @@
def [](codon)
@table[codon]
end
+ def translate_ambiguity(codon, unknown = 'X')
+ triplet = codon + "NNN"
+ aa = nil
+ Bio::NucleicAcid.ambiguity2individual(triplet[2..2]).each do|third|
+ Bio::NucleicAcid.ambiguity2individual(triplet[0..0]).each do|
first|
+ Bio::NucleicAcid.ambiguity2individual(triplet[1..1]).each do|
second|
+ if aa == nil
+ aa = @table[first+second+third]
+ elsif
+ aa != @table[first+second+third]
+ return unknown
+ end
+ end
+ end
+ end
+ aa
+ end
# Modify the codon table. Use with caution as it may break hard
coded
# tables. If you want to modify existing table, you should use copy
diff -ru bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/
bio/data/na.rb bioruby-c/lib/bio/data/na.rb
--- bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/bio/
data/na.rb 2008-09-03 22:24:39.000000000 +0900
+++ bioruby-c/lib/bio/data/na.rb 2008-09-11 09:26:00.000000000
+0900
@@ -182,6 +182,13 @@
end
Regexp.new(str)
end
+ def ambiguity2individual(na, rna = false)
+ str = NAMES[na.downcase].gsub(/[\[\]]/,"")
+ if rna
+ str.tr!("t", "u")
+ end
+ str.split(//)
+ end
end
diff -ru bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/
bio/sequence/na.rb bioruby-c/lib/bio/sequence/na.rb
--- bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/bio/
sequence/na.rb2008-09-03 22:24:39.000000000 +0900
+++ bioruby-c/lib/bio/sequence/na.rb 2008-09-11 09:48:52.000000000
+0900
@@ -252,7 +252,7 @@
end
nalen = naseq.length - from
nalen -= nalen % 3
- aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or
unknown}
+ aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or
ct.translate_ambiguity(codon, unknown)}
return Bio::Sequence::AA.new(aaseq)
end
--
Tomoaki NISHIYAMA
Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan
More information about the BioRuby
mailing list