[Biopython-dev] 3/18 biopython Questions - BioStar
Feed My Inbox
updates at feedmyinbox.com
Fri Mar 18 13:05:29 UTC 2011
// Error translating genbank CDS using BioPython
// March 17, 2011 at 4:28 AM
http://biostar.stackexchange.com/questions/6560/error-translating-genbank-cds-using-biopython
Hi all,
I'm trying to translate the a genbank record using BioPython 1.53, ignoring the already given translation in the CDS feature. The code I've written to translate this is pretty straight forward:
...
for gb_record in SeqIO.parse(file_handle, 'genbank'):#Bio.GenBank.Record
for gb_feature in gb_record.features:#Bio.SeqFeature
#Skip any non coding sequence features
if gb_feature.type != 'CDS':
continue
#Protein identifier is a property of the genbank feature
protein_id = gb_feature.qualifiers['protein_id'][0]
#Original sequence retrieved through BioPython 1.53+'s internal method
extracted_seq = gb_feature.extract(gb_record.seq)#Bio.Seq.Seq
#Translation table is a property of the genbank feature
transl_table = gb_feature.qualifiers['transl_table'][0]
#Translate entire sequence as coding sequence using translation table
#Additional CodonTables optionally available from Bio.Data.CodonTable
try:
protein_seq = extracted_seq.translate(table = transl_table, cds = True)
except TranslationError, err:
log.error('%s: Error in translating %s\n%s', gb_record.id, protein_id, extracted_seq)
raise err
#Write out fasta. Header format as requested: >genome_ac|protein_id
_write_fasta_line(write_handle, '{0}|{1}'.format(gb_record.id, protein_id), str(protein_seq))
The translate line throws a TranslationError on the following feature:
CDS complement(2276255..2279302)
/locus_tag="ECBD_2165"
/EC_number="1.7.99.4"
/inference="protein motif:TFAM:TIGR01553"
/note="KEGG: ssn:SSON_1650 formate dehydrogenase-N,
nitrate-inducible, alpha subunit;
TIGRFAM: formate dehydrogenase, alpha subunit;
PFAM: molybdopterin oxidoreductase; molybdopterin
oxidoreductase Fe4S4 region; molydopterin
dinucleotide-binding region"
/codon_start=1
/transl_except=(pos:complement(2278715..2278717),aa:Sec)
/transl_table=11
/product="formate dehydrogenase, alpha subunit"
/protein_id="YP_003036386.1"
/db_xref="GI:253773555"
/db_xref="InterPro:IPR006311"
/db_xref="InterPro:IPR006443"
/db_xref="InterPro:IPR006655"
/db_xref="InterPro:IPR006656"
/db_xref="InterPro:IPR006657"
/db_xref="InterPro:IPR006963"
/db_xref="GeneID:8157271"
/translation="MDVSRRQFFKICAGGMAGTTVAALGFAPKQALAQARNYKLLRAK
EIRNTCTYCSVGCGLLMYSLGDGAKNAREAIYHIEGDPDHPVSRGALCPKGAGLLDYV
NSENRLRYPEYRAPGSDKWQRISWEEAFSRIAKLMKADRDANFIEKNEQGVTVNRWLS
TGMLCASGASNETGMLTQKFARSLGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWV
DIKNANVVMVMGGNAAEAHPVGFRWAMEAKNNNDATLIVVDPRFTRTASVADIYAPIR
SGTDITFLSGVLRYLIENNKINAEYVKHYTNASLLVRDDFAFEDGLFSGYDAEKRQYD
KSSWNYQFDENGYAKRDETLTHPRCVWNLLKEHVSRYTPDVVENICGTPKADFLKVCE
VLASTSAPDRTTTFLYALGWTQHTVGAQNIRTMAMIQLLLGNMGMAGGGVNALRGHSN
IQGLTDLGLLSTSLPGYLTLPSEKQVDLQSYLEANTPKATLADQVNYWSNYPKFFVSL
MKSFYGDAAQKENNWGYDWLPKWDQTYDVIKYFNMMDEGKVTGYFCQGFNPVASFPDK
NKVVSCLSKLKYMVVIDPLVTETSTFWQNHGESNDVDPASIQTEVFRLPSTCFAEEDG
SIANSGRWLQWHWKGQDAPGEARNDGEILAGIYHHLRELYQAEGGKGVEPLMKMSWNY
KQPHEPQSDEVAKENNGYALEDLYDANGVLIAKKGQLLSSFAHLRDDGTTASSCWIYT
GSWTEQGNQMANRDNSDPSGLGNTLGWAWAWPLNRRVLYNRASADINGKPWDPKRMLI
QWNGSKWTGNDIPDFGNAAPGTPTGPFIMQPEGMGRLFAINKMAEGPFPEHYEPIETP
LGTNPLHPNVVSNPVVRLYEQDALRMGKKEQFPYVGTTYRLTEHFHTWTKHALLNAIA
QPEQFVEISETLAAAKGINNGDRVTVSSKRGFIRAVAVVTRRLKPLNVNGQQVETVGI
PIHWGFEGVARKGYIANTLTPNVGDANSQTPEYKAFLVNIEKA"
root: ERROR: NC_012947.1: Error in translating YP_003036386.1
ATGGACGTCAGTCGCAGACAATTTTTTAAAATCTGCGCGGGCGGTATGGCTGGAACAACGGTAGCGGCATTGGGCTTTGCCCCGAAGCAAGCACTGGCTCAGGCGCGAAACTACAAATTATTACGCGCTAAAGAGATCCGTAACACCTGCACATACTGTTCCGTAGGTTGCGGGCTATTGATGTATAGCCTGGGTGATGGCGCGAAAAACGCCAGAGAAGCGATTTATCACATTGAAGGTGACCCGGATCATCCGGTAAGCCGTGGTGCGCTGTGCCCAAAAGGGGCCGGTTTGCTGGATTACGTCAACAGCGAAAACCGTCTGCGCTACCCGGAATATCGTGCGCCAGGTTCTGACAAATGGCAGCGCATTAGCTGGGAAGAAGCATTCTCCCGTATTGCAAAGCTGATGAAAGCTGACCGTGACGCTAACTTTATTGAAAAGAACGAGCAGGGCGTAACGGTAAACCGTTGGCTTTCTACCGGTATGCTGTGTGCCTCCGGTGCCAGCAACGAAACCGGGATGCTGACACAGAAATTTGCCCGCTCCCTCGGGATGCTGGCGGTAGACAACCAGGCGCGCGTCTGACACGGACCAACGGTAGCAAGTCTTGCTCCAACATTTGGTCGCGGTGCGATGACCAACCACTGGGTGGATATCAAAAACGCTAACGTCGTAATGGTAATGGGCGGTAACGCTGCTGAAGCGCATCCCGTCGGTTTCCGCTGGGCGATGGAAGCGAAAAACAACAACGATGCAACCTTGATCGTTGTCGATCCTCGTTTTACGCGTACCGCTTCTGTGGCGGATATTTACGCACCTATTCGTTCCGGTACGGACATTACGTTCCTGTCTGGCGTTTTGCGCTACCTGATCGAAAACAACAAAATCAACGCCGAATACGTTAAACATTACACCAACGCCAGCCTGCTGGTGCGTGATGATTTTGCTTTCGAAGATGGCCTGTTCAGCGGTTATGACGCTGAAAAACGCCAGT
ACGACAAATCGTCCTGGAACTATCAGTTCGATGAAAACGGCTATGCGAAACGCGATGAAACACTGACTCATCCGCGCTGTGTGTGGAACCTGCTGAAAGAGCACGTTTCCCGCTACACGCCGGACGTCGTTGAAAACATCTGCGGTACGCCAAAAGCCGACTTCCTGAAAGTGTGTGAAGTGCTGGCCTCCACCAGCGCACCGGATCGCACAACCACCTTCCTGTACGCGCTGGGCTGGACGCAGCACACCGTGGGTGCGCAGAACATCCGTACTATGGCGATGATCCAGTTACTGCTCGGTAACATGGGTATGGCCGGTGGCGGCGTGAACGCATTGCGTGGTCACTCCAACATTCAGGGCCTGACTGACTTAGGTCTGCTCTCTACCAGCCTGCCAGGTTATCTGACGCTGCCGTCAGAAAAACAGGTTGATTTGCAGTCGTATCTGGAAGCGAACACGCCGAAAGCGACGCTGGCTGATCAGGTGAACTACTGGAGCAACTATCCGAAGTTCTTCGTTAGCCTGATGAAATCTTTCTATGGCGATGCCGCGCAGAAAGAGAACAACTGGGGCTATGACTGGCTGCCGAAGTGGGACCAGACCTACGACGTCATCAAGTATTTCAACATGATGGACGAAGGCAAAGTCACCGGTTATTTCTGCCAGGGCTTTAACCCGGTTGCGTCCTTCCCGGACAAAAACAAAGTGGTGAGCTGCCTGAGCAAGCTGAAGTACATGGTGGTTATCGATCCGCTGGTGACTGAAACCTCTACCTTCTGGCAGAACCACGGCGAGTCGAACGATGTCGATCCGGCGTCTATTCAGACTGAAGTATTCCGTCTGCCTTCGACCTGCTTTGCTGAAGAAGATGGTTCTATTGCTAACTCCGGTCGCTGGCTGCAGTGGCACTGGAAAGGTCAGGATGCGCCGGGCGAAGCGCGTAACGACGGTGAAATTCTGGCGGGTATCTACCATCACCTGCGCGAGCTGTACCA
GGCCGAAGGTGGTAAAGGCGTAGAACCGCTGATGAAGATGAGCTGGAACTACAAGCAGCCGCACGAACCGCAATCTGACGAAGTAGCTAAAGAGAACAACGGCTATGCGCTGGAAGATCTCTATGATGCTAATGGCGTGCTGATTGCGAAGAAAGGTCAGTTGCTGAGTAGCTTTGCGCATCTGCGTGATGACGGTACAACCGCATCTTCTTGCTGGATCTACACCGGTAGCTGGACAGAGCAGGGCAACCAGATGGCTAACCGCGATAACTCCGACCCGTCCGGTCTGGGGAATACGCTGGGATGGGCCTGGGCGTGGCCGCTCAACCGTCGCGTGCTGTACAACCGTGCTTCGGCGGATATCAACGGTAAACCGTGGGATCCGAAACGGATGCTGATCCAGTGGAACGGCAGCAAGTGGACGGGTAACGATATTCCTGACTTCGGCAATGCCGCACCGGGTACGCCAACCGGGCCGTTTATCATGCAGCCGGAAGGGATGGGACGCCTGTTTGCTATCAACAAAATGGCGGAAGGTCCGTTCCCGGAACACTACGAGCCGATTGAAACGCCGCTGGGCACTAACCCGCTGCATCCGAACGTGGTGTCTAACCCGGTTGTTCGTCTGTATGAACAAGACGCACTGCGGATGGGTAAAAAAGAGCAGTTCCCGTATGTGGGTACGACCTATCGTCTGACCGAGCACTTCCACACCTGGACCAAGCACGCATTGCTCAACGCAATTGCTCAGCCGGAACAGTTTGTGGAAATCAGCGAAACGCTGGCGGCGGCGAAAGGCATTAATAATGGCGATCGTGTCACTGTCTCAAGCAAGCGTGGCTTTATCCGCGCGGTGGCTGTGGTAACGCGTCGTCTGAAACCACTGAATGTAAATGGTCAGCAGGTTGAAACGGTGGGTATTCCAATCCACTGGGGCTTTGAGGGTGTCGCGCGTAAAGGTTATATCGCTAACACTCTGACGCCGAATGTCGGTGAT
GCAAACTCGCAAACGCCGGAATATAAAGCGTTCTTAGTCAACATCGAGAAGGCGTAA
Error:
Traceback (most recent call last):
File "/usr/lib/python2.6/unittest.py", line 279, in run
testMethod()
File "/home/user/jenkins/workspace/Divergence/divergence/src/divergence/test/test_translate.py", line 33, in test_translate_ecoli_and_salmo
fasta_file = translate_genbank_to_protein(genbank_file, ptt_file)
File "/home/user/jenkins/workspace/Divergence/divergence/src/divergence/translate.py", line 73, in translate_genbank_to_protein
raise err
TranslationError: Extra in frame stop codon found.
Now I'm guessing this has something to do with the /transl_except I'm seeing in the GenBank record, but I'm not (yet) sure. (The GenBank supplied translation contains a Selenocysteine.) But even if this is the cause: How would I properly handle this in my BioPython translation? I can't find any method to exclude certain sections from translation..
Can anyone help me fix the translation?
Best regards,
Tim
(Ps. Should anyone wonder why I'm not using the translation in the GenBank file directly: It's a requirement that I translate from the DNA sequence to protein myself...)
--
Website: http://biostar.stackexchange.com/questions/tagged/biopython
Account Login:
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email
Unsubscribe here:
http://www.feedmyinbox.com/feeds/unsubscribe/630206/59fe8f28e93f5744d887807619020b5988c5b82b/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email
--
This email was carefully delivered by FeedMyInbox.com.
PO Box 682532 Franklin, TN 37068
More information about the Biopython-dev
mailing list