[BioPython] Translation issues
Renato Alves
rjalves at igc.gulbenkian.pt
Mon Jan 28 09:58:50 UTC 2008
Hi.
I'm trying to automate and validate the process of translation in
sequences downloaded from NCBI.
Basically I fetch a GenBank file, extract the DNA sequences and use the
Translation module of BioPython to check if it matches. The problem is
that the starting aminoacid in NCBI is always M but with the Translation
module isn't, even if the codon is marked as "starting" in the
corresponding codon table.
So for instance, the sequence :
"TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA
ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC
GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA
ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA
CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG
AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA
ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA
AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA"
with codon table 11 will translate to:
a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"
while the translation on the GenBank file is:
b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"
causing the test a == b to fail. The sequences are exactly the same with
the exception of the initial aminoacid
I could do the test in other ways and remove the initial letter, but
that wouldn't work globally.
So, is this the right behavior or am I missing something?
Any other suggestions to do this test will also help.
Thanks
--
Renato Alves
More information about the Biopython
mailing list