[Biopython-dev] 3/18 biopython Questions - BioStar

Feed My Inbox updates at feedmyinbox.com
Fri Mar 18 13:05:29 UTC 2011


// Error translating genbank CDS using BioPython
// March 17, 2011 at 4:28 AM

http://biostar.stackexchange.com/questions/6560/error-translating-genbank-cds-using-biopython
Hi all,

I'm trying to translate the a genbank record using BioPython 1.53, ignoring the already given translation in the CDS feature. The code I've written to translate this is pretty straight forward:

...
for gb_record in SeqIO.parse(file_handle, 'genbank'):#Bio.GenBank.Record
    for gb_feature in gb_record.features:#Bio.SeqFeature
        #Skip any non coding sequence features
        if gb_feature.type != 'CDS':
            continue

        #Protein identifier is a property of the genbank feature
        protein_id = gb_feature.qualifiers['protein_id'][0]

        #Original sequence retrieved through BioPython 1.53+'s internal method 
        extracted_seq = gb_feature.extract(gb_record.seq)#Bio.Seq.Seq

        #Translation table is a property of the genbank feature
        transl_table = gb_feature.qualifiers['transl_table'][0]

        #Translate entire sequence as coding sequence using translation table
        #Additional CodonTables optionally available from Bio.Data.CodonTable
        try:
            protein_seq = extracted_seq.translate(table = transl_table, cds = True)
        except TranslationError, err:
            log.error('%s: Error in translating %s\n%s', gb_record.id, protein_id, extracted_seq)
            raise err

        #Write out fasta. Header format as requested: >genome_ac|protein_id
        _write_fasta_line(write_handle, '{0}|{1}'.format(gb_record.id, protein_id), str(protein_seq))


The translate line throws a TranslationError on the following feature:

 CDS             complement(2276255..2279302)
                 /locus_tag="ECBD_2165"
                 /EC_number="1.7.99.4"
                 /inference="protein motif:TFAM:TIGR01553"
                 /note="KEGG: ssn:SSON_1650 formate dehydrogenase-N,
                 nitrate-inducible, alpha subunit;
                 TIGRFAM: formate dehydrogenase, alpha subunit;
                 PFAM: molybdopterin oxidoreductase; molybdopterin
                 oxidoreductase Fe4S4 region; molydopterin
                 dinucleotide-binding region"
                 /codon_start=1
                 /transl_except=(pos:complement(2278715..2278717),aa:Sec)
                 /transl_table=11
                 /product="formate dehydrogenase, alpha subunit"
                 /protein_id="YP_003036386.1"
                 /db_xref="GI:253773555"
                 /db_xref="InterPro:IPR006311"
                 /db_xref="InterPro:IPR006443"
                 /db_xref="InterPro:IPR006655"
                 /db_xref="InterPro:IPR006656"
                 /db_xref="InterPro:IPR006657"
                 /db_xref="InterPro:IPR006963"
                 /db_xref="GeneID:8157271"
                 /translation="MDVSRRQFFKICAGGMAGTTVAALGFAPKQALAQARNYKLLRAK
                 EIRNTCTYCSVGCGLLMYSLGDGAKNAREAIYHIEGDPDHPVSRGALCPKGAGLLDYV
                 NSENRLRYPEYRAPGSDKWQRISWEEAFSRIAKLMKADRDANFIEKNEQGVTVNRWLS
                 TGMLCASGASNETGMLTQKFARSLGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWV
                 DIKNANVVMVMGGNAAEAHPVGFRWAMEAKNNNDATLIVVDPRFTRTASVADIYAPIR
                 SGTDITFLSGVLRYLIENNKINAEYVKHYTNASLLVRDDFAFEDGLFSGYDAEKRQYD
                 KSSWNYQFDENGYAKRDETLTHPRCVWNLLKEHVSRYTPDVVENICGTPKADFLKVCE
                 VLASTSAPDRTTTFLYALGWTQHTVGAQNIRTMAMIQLLLGNMGMAGGGVNALRGHSN
                 IQGLTDLGLLSTSLPGYLTLPSEKQVDLQSYLEANTPKATLADQVNYWSNYPKFFVSL
                 MKSFYGDAAQKENNWGYDWLPKWDQTYDVIKYFNMMDEGKVTGYFCQGFNPVASFPDK
                 NKVVSCLSKLKYMVVIDPLVTETSTFWQNHGESNDVDPASIQTEVFRLPSTCFAEEDG
                 SIANSGRWLQWHWKGQDAPGEARNDGEILAGIYHHLRELYQAEGGKGVEPLMKMSWNY
                 KQPHEPQSDEVAKENNGYALEDLYDANGVLIAKKGQLLSSFAHLRDDGTTASSCWIYT
                 GSWTEQGNQMANRDNSDPSGLGNTLGWAWAWPLNRRVLYNRASADINGKPWDPKRMLI
                 QWNGSKWTGNDIPDFGNAAPGTPTGPFIMQPEGMGRLFAINKMAEGPFPEHYEPIETP
                 LGTNPLHPNVVSNPVVRLYEQDALRMGKKEQFPYVGTTYRLTEHFHTWTKHALLNAIA
                 QPEQFVEISETLAAAKGINNGDRVTVSSKRGFIRAVAVVTRRLKPLNVNGQQVETVGI
                 PIHWGFEGVARKGYIANTLTPNVGDANSQTPEYKAFLVNIEKA"

root: ERROR: NC_012947.1: Error in translating YP_003036386.1
ATGGACGTCAGTCGCAGACAATTTTTTAAAATCTGCGCGGGCGGTATGGCTGGAACAACGGTAGCGGCATTGGGCTTTGCCCCGAAGCAAGCACTGGCTCAGGCGCGAAACTACAAATTATTACGCGCTAAAGAGATCCGTAACACCTGCACATACTGTTCCGTAGGTTGCGGGCTATTGATGTATAGCCTGGGTGATGGCGCGAAAAACGCCAGAGAAGCGATTTATCACATTGAAGGTGACCCGGATCATCCGGTAAGCCGTGGTGCGCTGTGCCCAAAAGGGGCCGGTTTGCTGGATTACGTCAACAGCGAAAACCGTCTGCGCTACCCGGAATATCGTGCGCCAGGTTCTGACAAATGGCAGCGCATTAGCTGGGAAGAAGCATTCTCCCGTATTGCAAAGCTGATGAAAGCTGACCGTGACGCTAACTTTATTGAAAAGAACGAGCAGGGCGTAACGGTAAACCGTTGGCTTTCTACCGGTATGCTGTGTGCCTCCGGTGCCAGCAACGAAACCGGGATGCTGACACAGAAATTTGCCCGCTCCCTCGGGATGCTGGCGGTAGACAACCAGGCGCGCGTCTGACACGGACCAACGGTAGCAAGTCTTGCTCCAACATTTGGTCGCGGTGCGATGACCAACCACTGGGTGGATATCAAAAACGCTAACGTCGTAATGGTAATGGGCGGTAACGCTGCTGAAGCGCATCCCGTCGGTTTCCGCTGGGCGATGGAAGCGAAAAACAACAACGATGCAACCTTGATCGTTGTCGATCCTCGTTTTACGCGTACCGCTTCTGTGGCGGATATTTACGCACCTATTCGTTCCGGTACGGACATTACGTTCCTGTCTGGCGTTTTGCGCTACCTGATCGAAAACAACAAAATCAACGCCGAATACGTTAAACATTACACCAACGCCAGCCTGCTGGTGCGTGATGATTTTGCTTTCGAAGATGGCCTGTTCAGCGGTTATGACGCTGAAAAACGCCAGT
ACGACAAATCGTCCTGGAACTATCAGTTCGATGAAAACGGCTATGCGAAACGCGATGAAACACTGACTCATCCGCGCTGTGTGTGGAACCTGCTGAAAGAGCACGTTTCCCGCTACACGCCGGACGTCGTTGAAAACATCTGCGGTACGCCAAAAGCCGACTTCCTGAAAGTGTGTGAAGTGCTGGCCTCCACCAGCGCACCGGATCGCACAACCACCTTCCTGTACGCGCTGGGCTGGACGCAGCACACCGTGGGTGCGCAGAACATCCGTACTATGGCGATGATCCAGTTACTGCTCGGTAACATGGGTATGGCCGGTGGCGGCGTGAACGCATTGCGTGGTCACTCCAACATTCAGGGCCTGACTGACTTAGGTCTGCTCTCTACCAGCCTGCCAGGTTATCTGACGCTGCCGTCAGAAAAACAGGTTGATTTGCAGTCGTATCTGGAAGCGAACACGCCGAAAGCGACGCTGGCTGATCAGGTGAACTACTGGAGCAACTATCCGAAGTTCTTCGTTAGCCTGATGAAATCTTTCTATGGCGATGCCGCGCAGAAAGAGAACAACTGGGGCTATGACTGGCTGCCGAAGTGGGACCAGACCTACGACGTCATCAAGTATTTCAACATGATGGACGAAGGCAAAGTCACCGGTTATTTCTGCCAGGGCTTTAACCCGGTTGCGTCCTTCCCGGACAAAAACAAAGTGGTGAGCTGCCTGAGCAAGCTGAAGTACATGGTGGTTATCGATCCGCTGGTGACTGAAACCTCTACCTTCTGGCAGAACCACGGCGAGTCGAACGATGTCGATCCGGCGTCTATTCAGACTGAAGTATTCCGTCTGCCTTCGACCTGCTTTGCTGAAGAAGATGGTTCTATTGCTAACTCCGGTCGCTGGCTGCAGTGGCACTGGAAAGGTCAGGATGCGCCGGGCGAAGCGCGTAACGACGGTGAAATTCTGGCGGGTATCTACCATCACCTGCGCGAGCTGTACCA
GGCCGAAGGTGGTAAAGGCGTAGAACCGCTGATGAAGATGAGCTGGAACTACAAGCAGCCGCACGAACCGCAATCTGACGAAGTAGCTAAAGAGAACAACGGCTATGCGCTGGAAGATCTCTATGATGCTAATGGCGTGCTGATTGCGAAGAAAGGTCAGTTGCTGAGTAGCTTTGCGCATCTGCGTGATGACGGTACAACCGCATCTTCTTGCTGGATCTACACCGGTAGCTGGACAGAGCAGGGCAACCAGATGGCTAACCGCGATAACTCCGACCCGTCCGGTCTGGGGAATACGCTGGGATGGGCCTGGGCGTGGCCGCTCAACCGTCGCGTGCTGTACAACCGTGCTTCGGCGGATATCAACGGTAAACCGTGGGATCCGAAACGGATGCTGATCCAGTGGAACGGCAGCAAGTGGACGGGTAACGATATTCCTGACTTCGGCAATGCCGCACCGGGTACGCCAACCGGGCCGTTTATCATGCAGCCGGAAGGGATGGGACGCCTGTTTGCTATCAACAAAATGGCGGAAGGTCCGTTCCCGGAACACTACGAGCCGATTGAAACGCCGCTGGGCACTAACCCGCTGCATCCGAACGTGGTGTCTAACCCGGTTGTTCGTCTGTATGAACAAGACGCACTGCGGATGGGTAAAAAAGAGCAGTTCCCGTATGTGGGTACGACCTATCGTCTGACCGAGCACTTCCACACCTGGACCAAGCACGCATTGCTCAACGCAATTGCTCAGCCGGAACAGTTTGTGGAAATCAGCGAAACGCTGGCGGCGGCGAAAGGCATTAATAATGGCGATCGTGTCACTGTCTCAAGCAAGCGTGGCTTTATCCGCGCGGTGGCTGTGGTAACGCGTCGTCTGAAACCACTGAATGTAAATGGTCAGCAGGTTGAAACGGTGGGTATTCCAATCCACTGGGGCTTTGAGGGTGTCGCGCGTAAAGGTTATATCGCTAACACTCTGACGCCGAATGTCGGTGAT
GCAAACTCGCAAACGCCGGAATATAAAGCGTTCTTAGTCAACATCGAGAAGGCGTAA


Error:

Traceback (most recent call last):
  File "/usr/lib/python2.6/unittest.py", line 279, in run
    testMethod()
  File "/home/user/jenkins/workspace/Divergence/divergence/src/divergence/test/test_translate.py", line 33, in test_translate_ecoli_and_salmo
    fasta_file = translate_genbank_to_protein(genbank_file, ptt_file)
  File "/home/user/jenkins/workspace/Divergence/divergence/src/divergence/translate.py", line 73, in translate_genbank_to_protein
    raise err
TranslationError: Extra in frame stop codon found.


Now I'm guessing this has something to do with the /transl_except I'm seeing in the GenBank record, but I'm not (yet) sure. (The GenBank supplied translation contains a Selenocysteine.) But even if this is the cause: How would I properly handle this in my BioPython translation? I can't find any method to exclude certain sections from translation..

Can anyone help me fix the translation?

Best regards,
Tim

(Ps. Should anyone wonder why I'm not using the translation in the GenBank file directly: It's a requirement that I translate from the DNA sequence to protein myself...)


--
Website: http://biostar.stackexchange.com/questions/tagged/biopython

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/630206/59fe8f28e93f5744d887807619020b5988c5b82b/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068




More information about the Biopython-dev mailing list