[Biopython] Could Bio.SeqIO write EMBL file?
Anne Pajon
ap12 at sanger.ac.uk
Mon Jan 11 17:32:43 UTC 2010
Hi Peter,
Just tested now.
It worked fine. Thanks a lot.
Here is the diff between the EMBL output from Bio.SeqIO and the
genbank output from Bio.SeqIO converted with the EMBOSS tool to an
EMBL file:
guest137:RAST ap12$ diff tmp.embl updated_files/
Alistipes_shahii_WAL8301_uRAST.embl
1c1
< ID unknown; SV 1; ; DNA; ; ; 3763317 BP.
---
> ID unknown; SV 1; linear; unassigned DNA; STD; UNC; 3763317 BP.
5c5
< DE
---
> KW .
8c8
< OC .
---
> XX
10a11
> FH
1949,1950c1950
< FT /product="Peptidyl-prolyl cis-trans isomerase (EC
< FT 5.2.1.8)"
---
> FT /product="Peptidyl-prolyl cis-trans isomerase
(EC 5.2.1.8)"
3346,3347c3346
< FT kinase/response regulator, hybrid ('one component
< FT system')"
---
> FT kinase/response regulator, hybrid ('one
component system')"
3380,3381c3379
< FT /product="Iron-sulfur cluster assembly ATPase
protein
< FT SufC"
---
> FT /product="Iron-sulfur cluster assembly ATPase
protein SufC"
4811,4812c4809
< FT /product="Gamma-glutamyl phosphate reductase (EC
< FT 1.2.1.41)"
---
> FT /product="Gamma-glutamyl phosphate reductase
(EC 1.2.1.41)"
5472,5473c5469
< FT /product="lipoprotein releasing system ATP-
binding
< FT protein"
---
> FT /product="lipoprotein releasing system ATP-
binding protein"
5881,5882c5877
< FT /product="NAD-dependent protein deacetylase of
SIR2
< FT family"
---
> FT /product="NAD-dependent protein deacetylase of
SIR2 family"
6032,6033c6027
< FT /product="Exodeoxyribonuclease V alpha chain (EC
< FT 3.1.11.5)"
---
> FT /product="Exodeoxyribonuclease V alpha chain
(EC 3.1.11.5)"
6495,6496c6489
< FT /product="Pyrophosphate-energized proton pump (EC
< FT 3.6.1.1)"
---
> FT /product="Pyrophosphate-energized proton pump
(EC 3.6.1.1)"
6946,6947c6939
< FT /product="Exodeoxyribonuclease V alpha chain (EC
< FT 3.1.11.5)"
---
> FT /product="Exodeoxyribonuclease V alpha chain
(EC 3.1.11.5)"
7128,7129c7120
< FT /product="N-acyl-L-amino acid amidohydrolase (EC
< FT 3.5.1.14)"
---
> FT /product="N-acyl-L-amino acid amidohydrolase
(EC 3.5.1.14)"
8035,8036c8026
< FT /product="D-3-phosphoglycerate dehydrogenase (EC
< FT 1.1.1.95)"
---
> FT /product="D-3-phosphoglycerate dehydrogenase
(EC 1.1.1.95)"
8601,8602c8591
< FT /product="Acetolactate synthase small subunit (EC
< FT 2.2.1.6)"
---
> FT /product="Acetolactate synthase small subunit
(EC 2.2.1.6)"
8608,8609c8597
< FT /product="Acetolactate synthase large subunit (EC
< FT 2.2.1.6)"
---
> FT /product="Acetolactate synthase large subunit
(EC 2.2.1.6)"
9152,9153c9140
< FT /product="Exodeoxyribonuclease V alpha chain (EC
< FT 3.1.11.5)"
---
> FT /product="Exodeoxyribonuclease V alpha chain
(EC 3.1.11.5)"
10659,10660c10646
< FT kinase/response regulator, hybrid ('one-component
< FT system')"
---
> FT kinase/response regulator, hybrid ('one-
component system')"
12056,12057c12042
< FT /product="N-acetylmuramoyl-L-alanine amidase (EC
< FT 3.5.1.28)"
---
> FT /product="N-acetylmuramoyl-L-alanine amidase
(EC 3.5.1.28)"
12957,12958c12942
< FT /product="Phosphatidate cytidylyltransferase (EC
< FT 2.7.7.41)"
---
> FT /product="Phosphatidate cytidylyltransferase
(EC 2.7.7.41)"
13550,13551c13534
< FT /product="Glutamine synthetase type III, GlnN (EC
< FT 6.3.1.2)"
---
> FT /product="Glutamine synthetase type III, GlnN
(EC 6.3.1.2)"
14344c14327,14328
< SQ
---
> XX
> SQ Sequence 3763317 BP; 772804 A; 1042979 C; 1057681 G; 776208 T;
113645 other;
The main differences are on line breaks.
Regards,
Anne.
On 11 Jan 2010, at 16:22, Peter wrote:
> Hi Anne,
>
> I've just checked in feature support to the new EMBL output in
> Bio.SeqIO
> (our main branch on git). If you could give that a test it would be
> very
> much appreciated. If you are on the dev mailing list, we can discuss
> issues there - otherwise we might as well continue on this thread.
>
> Thanks,
>
> Peter
--
Dr Anne Pajon - Pathogen Genomics, Team 81
Sanger Institute, Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SA, United Kingdom
+44 (0)1223 494 798 (office) | +44 (0)7958 511 353 (mobile)
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Biopython
mailing list