[BioPython] genbank annotation
Karin Lagesen
karin.lagesen at medisin.uio.no
Fri Oct 8 09:21:15 EDT 2004
Hi!
I have two genbank genome files, one of the old kind where each region is
noted twize, and one where they are unique.
What I would like to extract from this is the feature information, in
this sort of format:
Type start stop direction name
In the first case, where almost all regions are noted twize, I'd like
to have only one of them included in the list.
You have a genbank parser thing in biopython which I'd like to use,
however, I cannot figure out how to use it to do this.
The files:
The first:
source 1..2944528
/organism="Listeria monocytogenes"
/mol_type="genomic DNA"
/strain="EGD-e"
/db_xref="taxon:1639"
gene 305..1673
/gene="dnaA"
RBS 305..310
CDS 318..1673
/codon_start=1
/transl_table=11
/product="Chromosomal replication initiation protein DnaA"
/protein_id="CAC98216.1"
/db_xref="GI:16409360"
/db_xref="GOA:Q8YAW2"
/db_xref="UniProt/Swiss-Prot:Q8YAW2"
/translation="MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIIS
APNNFVRDWLEKSYTQFIANILQEITGRLFDVRFIDGEQEENFEYTVIKPNPALDEDG
IEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPAKAYNPLFIYGGVGLGKTHLM
HAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVLLIDDIQFL
AGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPD
LETRIAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITA
GLAAEALKDIIPSSKSQVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYL
SRELTDASLPKIGDEFGGRDHTTVIHAHEKISQLLKTDQVLKNDLAEIEKNLRKAQNM
F"
gene 1856..3062
/gene="dnaN"
RBS 1856..1860
CDS 1867..3012
/codon_start=1
/transl_table=11
/product="DNA polymerase III, beta chain"
/protein_id="CAC98217.1"
/db_xref="GI:16409361"
/db_xref="GOA:Q8YAW1"
/db_xref="UniProt/TrEMBL:Q8YAW1"
/translation="MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTL
TGSDSDISIEAFIPLIENDEVIVEVESFGGIVLQSKYFGDIVRRLPEENVEIEVTSNY
QTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPINVLKNIVRQTVFAVSAIEVRP
VLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSELNKLLDDA
SESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAID
RASLLARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKY
MMDALRAFEGDDIQISFSGTMRPFVLRPKDAANPNEILQLITPVRTY"
The second:
source 1..4214630
/strain=168
/organism="Bacillus subtilis subsp. subtilis str.
168"
/mol_type="genomic DNA"
/db_xref="taxon:224308"
CDS 410..1750
/function="initiation of chromosome replication (DNA
synthesis)"
/gene="dnaA"
/protein_id="CAB11777.1"
/locus_tag="BSU00010"
/transl_table=11
/translation="MENILDLWNQALAQIEKKLSKPSFETWMKSTKAHSLQGDTLTIT
APNEFARDWLESRYLHLIADTIYELTGEELSIKFVIPQNQDVEDFMPKPQVKKAVKED
TSDFPQNMLNPKYTFDTFVIGSGNRFAHAASLAVAEAPAKAYNPLFIYGGVGLGKTHL
MHAIGHYVIDHNPSAKVVYLSSEKFTNEFINSIRDNKAVDFRNRYRNVDVLLIDDIQF
LAGKEQTQEEFFHTFNTLHEESKQIVISSDRPPKEIPTLEDRLRSRFEWGLITDITPP
DLETRIAILRKKAKAEGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLINKDIN
ADLAAEALKDIIPSSKPKVITIKEIQRVVGQQFNIKLEDFKAKKRTKSVAFPRQIAMY
LSREMTDSSLPKIGEEFGGRDHTTVIHAHEKISKLLADDEQLQQHVKEIKEQLK"
/db_xref="GOA:P05648"
/db_xref="SUBTILIS:BG10065"
/db_xref="SWISS-PROT:P05648"
/note="alternate gene name: dnaH, dnaJ, dnaK"
CDS 1939..3075
/locus_tag="BSU00020"
/transl_table=11
/translation="MKFTIQKDRLVESVQDVLKAVSSRTTIPILTGIKIVASDDGVSF
TGSDSDISIESFIPKEEGDKEIVTIEQPGSIVLQARFFSEIVKKLPMATVEIEVQNQY
LTIIRSGKAEFNLNGLDADEYPHLPQIEEHHAIQIPTDLLKNLIRQTVFAVSTSETRP
ILTGVNWKVEQSELLCTATDSHRLALRKAKLDIPEDRSYNVVIPGKSLTELSKILDDN
QELVDIVITETQVLFKAKNVLFFSRLLDGNYPDTTSLIPQDSKTEIIVNTKEFLQAID
RASLLAREGRNNVVKLSAKPAESIEISSNSPEIGKVVEAIVADQIEGEELNISFSPKY
MLDALKVLEGAEIRVSFTGAMRPFLIRTPNDETIVQLILPVRTY"
/product="DNA polymerase III (beta subunit)"
/function="DNA synthesis"
/gene="dnaN"
/EC_number="2.7.7.7"
/protein_id="CAB11778.1"
/db_xref="GOA:P05649"
/db_xref="SUBTILIS:BG10066"
/db_xref="SWISS-PROT:P05649"
/note="alternate gene name: dnaG, dnaK"
Karin
--
Karin Lagesen, PhD student
karin.lagesen at medisin.uio.no
http://www.cmbn.no/rognes/
More information about the BioPython
mailing list