[BioRuby] Translate ambiguous sequence
Tomoaki NISHIYAMA
tomoakin at kenroku.kanazawa-u.ac.jp
Mon Sep 15 06:08:56 EDT 2008
Hi,
To further make translation compatible what is done between DNA entry
and protein
entry in databases, I thought that special treatment of the start
codon and
incomplete codons are necessary.
Special treatment of the start codons are for those codons that is
translated to M only when it is used as the start codon and
a different amino acids if it is used as an internal codon within a CDS.
For example GUG is V if it is internal to the CDS, but it can also serve
as a start codon and in that case it encodes M.
To change the behavior, I think an option is required.
Incomplete codons are seen at the end of incomplete CDS, presumably
due to
cloning or sequencing strategy.
When there are 'cg' at the end of CDS that are translated to 'R'
as any nucleotide would make the codon translate as 'R'
It seems the translation are added only if the amino acid can be
specified and is not 'X'.
--
Tomoaki NISHIYAMA
Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan
diff -ru bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/
bio/data/codontable.rb bioruby-a/lib/bio/data/codontable.rb
--- bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/bio/
data/codontable.rb 2008-09-03 22:24:39.000000000 +0900
+++ bioruby-a/lib/bio/data/codontable.rb 2008-09-13
12:06:28.000000000 +0900
@@ -93,6 +93,23 @@
def [](codon)
@table[codon]
end
+ def translate_ambiguity(codon, unknown = 'X')
+ triplet = codon + "NNN"
+ aa = nil
+ Bio::NucleicAcid.ambiguity2individual(triplet[2..2]).each do|third|
+ Bio::NucleicAcid.ambiguity2individual(triplet[0..0]).each do|
first|
+ Bio::NucleicAcid.ambiguity2individual(triplet[1..1]).each do|
second|
+ if aa == nil
+ aa = @table[first+second+third]
+ elsif
+ aa != @table[first+second+third]
+ return unknown
+ end
+ end
+ end
+ end
+ aa
+ end
# Modify the codon table. Use with caution as it may break hard
coded
# tables. If you want to modify existing table, you should use copy
diff -ru bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/
bio/data/na.rb bioruby-a/lib/bio/data/na.rb
--- bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/bio/
data/na.rb 2008-09-03 22:24:39.000000000 +0900
+++ bioruby-a/lib/bio/data/na.rb 2008-09-13 12:06:28.000000000
+0900
@@ -182,6 +182,13 @@
end
Regexp.new(str)
end
+ def ambiguity2individual(na, rna = false)
+ str = NAMES[na.downcase].gsub(/[\[\]]/,"")
+ if rna
+ str.tr!("t", "u")
+ end
+ str.split(//)
+ end
end
diff -ru bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/
bio/sequence/na.rb bioruby-a/lib/bio/sequence/na.rb
--- bioruby-bioruby-1440b766202a2b66ac7386b9b46928834a9c9873/lib/bio/
sequence/na.rb 2008-09-03 22:24:39.000000000 +0900
+++ bioruby-a/lib/bio/sequence/na.rb 2008-09-15 18:57:19.000000000
+0900
@@ -231,7 +231,7 @@
# (default 1)
# * (optional) _unknown_: Character (default 'X')
# *Returns*:: Bio::Sequence::AA object
- def translate(frame = 1, table = 1, unknown = 'X')
+ def translate(frame = 1, table = 1, unknown = 'X', check_start =
false)
if table.is_a?(Bio::CodonTable)
ct = table
else
@@ -251,8 +251,19 @@
from = 0
end
nalen = naseq.length - from
- nalen -= nalen % 3
- aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or
unknown}
+# nalen -= nalen % 3
+ if check_start and from == 0 and ct.start_codon?(naseq[0, 3])
+ if nalen > 3
+ aaseq = "M" + naseq[from+3, nalen-3].gsub(/.{1,3}/) {|codon|
ct[codon] or ct.translate_ambiguity(codon, unknown)}
+ else
+ aaseq = "M"
+ end
+ else
+ aaseq = naseq[from, nalen].gsub(/.{1,3}/) {|codon| ct[codon]
or ct.translate_ambiguity(codon, unknown)}
+ end
+ if nalen % 3 != 0
+ aaseq.sub!(/X$/,"")
+ end
return Bio::Sequence::AA.new(aaseq)
end
More information about the BioRuby
mailing list