[BioPython] Translation issues

Renato Alves rjalves at igc.gulbenkian.pt
Mon Jan 28 04:58:50 EST 2008


Hi.

I'm trying to automate and validate the process of translation in 
sequences downloaded from NCBI.

Basically I fetch a GenBank file, extract the DNA sequences and use the 
Translation module of BioPython to check if it matches. The problem is 
that the starting aminoacid in NCBI is always M but with the Translation 
module isn't, even if the codon is marked as "starting" in the 
corresponding codon table.

So for instance, the sequence :

"TTGGATTATTTAATAGAGGGTTTAAGTTATAATCCTGTAGACCACACAGCTACATCTGGACCAACTGTAATGGAAGCTGCACTGATTGCTAA
ACATGTTTATTCAGGGGAAAAAGGAGATGAATTACCCGGTGGATGGAAAATGCTTGAAGATCCATATATGGTTGGAGGTCTTCGAATGGGC
GTATATGGGAGAAAAGGTGAGGATGGAGAGATGGAATATGTAATTGCAAATGCAGGAACAGAACCTACTAGTTTGATAGATTGGGAGAATA
ATTTGAAACAACCTTTTGGGAAATCAGAAGATATGAAAAATTCTTTAGCTTTTGTTGAAGAGTTTATGAAAAACAATCCAAGTATTAATGTAA
CATTTGTTGGACATTCAAAAGGTGGGGCTGAAGCAGCTGCAAATGCGGTACTTACAAATAGGAATGCAATACTATTTAATCCTGCCACAGTG
AACTTAGAATCATATTTAAAGCCATATGGTGTGAACAAGTCAAATTATACTGCTGAGATGACGGCATTTATTGTAGAAGACGAAATTTTGAATA
ATATCTTTGGATTTATATCAACGCCGATAGACAAGGTAGTTTATTTACCCAGACAGCATTCTTTTTTCATATCGATTCCACTTATAGATATGGTA
AATTCGATTCGAAATCATTCGATGGATGCAACGATAAAGGCAATAGAAGAATGGGAGGAAAATAGACAATGA"

with codon table 11 will translate to:

a="LDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"

while the translation on the GenBank file is:

b="MDYLIEGLSYNPVDHTATSGPTVMEAALIAKHVYSGEKGDELPGGWKMLEDPYMVGGLRMGVYGRKGEDGEMEYVIANAGTEPTSLIDWENN
LKQPFGKSEDMKNSLAFVEEFMKNNPSINVTFVGHSKGGAEAAANAVLTNRNAILFNPATVNLESYLKPYGVNKSNYTAEMTAFIVEDEILNNIFG
FISTPIDKVVYLPRQHSFFISIPLIDMVNSIRNHSMDATIKAIEEWEENRQ"

causing the test a == b to fail. The sequences are exactly the same with 
the exception of the initial aminoacid

I could do the test in other ways and remove the initial letter, but 
that wouldn't work globally.

So, is this the right behavior or am I missing something?

Any other suggestions to do this test will also help.

Thanks
--
Renato Alves


More information about the BioPython mailing list