[Biopython] Could Bio.SeqIO write EMBL file?

Anne Pajon ap12 at sanger.ac.uk
Mon Jan 11 17:32:43 UTC 2010


Hi Peter,

Just tested now.

It worked fine. Thanks a lot.

Here is the diff between the EMBL output from Bio.SeqIO and the  
genbank output from Bio.SeqIO converted with the EMBOSS tool to an  
EMBL file:

guest137:RAST ap12$ diff tmp.embl updated_files/ 
Alistipes_shahii_WAL8301_uRAST.embl
1c1
< ID   unknown; SV 1; ; DNA; ; ; 3763317 BP.
---
 > ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 3763317 BP.
5c5
< DE
---
 > KW   .
8c8
< OC   .
---
 > XX
10a11
 > FH
1949,1950c1950
< FT                   /product="Peptidyl-prolyl cis-trans isomerase (EC
< FT                   5.2.1.8)"
---
 > FT                   /product="Peptidyl-prolyl cis-trans isomerase  
(EC 5.2.1.8)"
3346,3347c3346
< FT                   kinase/response regulator, hybrid ('one component
< FT                   system')"
---
 > FT                   kinase/response regulator, hybrid ('one  
component system')"
3380,3381c3379
< FT                   /product="Iron-sulfur cluster assembly ATPase  
protein
< FT                   SufC"
---
 > FT                   /product="Iron-sulfur cluster assembly ATPase  
protein SufC"
4811,4812c4809
< FT                   /product="Gamma-glutamyl phosphate reductase (EC
< FT                   1.2.1.41)"
---
 > FT                   /product="Gamma-glutamyl phosphate reductase  
(EC 1.2.1.41)"
5472,5473c5469
< FT                   /product="lipoprotein releasing system ATP- 
binding
< FT                   protein"
---
 > FT                   /product="lipoprotein releasing system ATP- 
binding protein"
5881,5882c5877
< FT                   /product="NAD-dependent protein deacetylase of  
SIR2
< FT                   family"
---
 > FT                   /product="NAD-dependent protein deacetylase of  
SIR2 family"
6032,6033c6027
< FT                   /product="Exodeoxyribonuclease V alpha chain (EC
< FT                   3.1.11.5)"
---
 > FT                   /product="Exodeoxyribonuclease V alpha chain  
(EC 3.1.11.5)"
6495,6496c6489
< FT                   /product="Pyrophosphate-energized proton pump (EC
< FT                   3.6.1.1)"
---
 > FT                   /product="Pyrophosphate-energized proton pump  
(EC 3.6.1.1)"
6946,6947c6939
< FT                   /product="Exodeoxyribonuclease V alpha chain (EC
< FT                   3.1.11.5)"
---
 > FT                   /product="Exodeoxyribonuclease V alpha chain  
(EC 3.1.11.5)"
7128,7129c7120
< FT                   /product="N-acyl-L-amino acid amidohydrolase (EC
< FT                   3.5.1.14)"
---
 > FT                   /product="N-acyl-L-amino acid amidohydrolase  
(EC 3.5.1.14)"
8035,8036c8026
< FT                   /product="D-3-phosphoglycerate dehydrogenase (EC
< FT                   1.1.1.95)"
---
 > FT                   /product="D-3-phosphoglycerate dehydrogenase  
(EC 1.1.1.95)"
8601,8602c8591
< FT                   /product="Acetolactate synthase small subunit (EC
< FT                   2.2.1.6)"
---
 > FT                   /product="Acetolactate synthase small subunit  
(EC 2.2.1.6)"
8608,8609c8597
< FT                   /product="Acetolactate synthase large subunit (EC
< FT                   2.2.1.6)"
---
 > FT                   /product="Acetolactate synthase large subunit  
(EC 2.2.1.6)"
9152,9153c9140
< FT                   /product="Exodeoxyribonuclease V alpha chain (EC
< FT                   3.1.11.5)"
---
 > FT                   /product="Exodeoxyribonuclease V alpha chain  
(EC 3.1.11.5)"
10659,10660c10646
< FT                   kinase/response regulator, hybrid ('one-component
< FT                   system')"
---
 > FT                   kinase/response regulator, hybrid ('one- 
component system')"
12056,12057c12042
< FT                   /product="N-acetylmuramoyl-L-alanine amidase (EC
< FT                   3.5.1.28)"
---
 > FT                   /product="N-acetylmuramoyl-L-alanine amidase  
(EC 3.5.1.28)"
12957,12958c12942
< FT                   /product="Phosphatidate cytidylyltransferase (EC
< FT                   2.7.7.41)"
---
 > FT                   /product="Phosphatidate cytidylyltransferase  
(EC 2.7.7.41)"
13550,13551c13534
< FT                   /product="Glutamine synthetase type III, GlnN (EC
< FT                   6.3.1.2)"
---
 > FT                   /product="Glutamine synthetase type III, GlnN  
(EC 6.3.1.2)"
14344c14327,14328
< SQ
---
 > XX
 > SQ   Sequence 3763317 BP; 772804 A; 1042979 C; 1057681 G; 776208 T;  
113645 other;

The main differences are on line breaks.

Regards,
Anne.


On 11 Jan 2010, at 16:22, Peter wrote:

> Hi Anne,
>
> I've just checked in feature support to the new EMBL output in  
> Bio.SeqIO
> (our main branch on git). If you could give that a test it would be  
> very
> much appreciated. If you are on the dev mailing list, we can discuss
> issues there - otherwise we might as well continue on this thread.
>
> Thanks,
>
> Peter

--
Dr Anne Pajon - Pathogen Genomics, Team 81
Sanger Institute, Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SA, United Kingdom
+44 (0)1223 494 798 (office) | +44 (0)7958 511 353 (mobile)



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Biopython mailing list