[Biopython] Creating GenBank files

Peter Saffrey pzs at dcs.gla.ac.uk
Wed Sep 16 16:14:20 UTC 2009


Peter wrote:
> Yes, you must create a SeqRecord object with suitable SeqFeature objects,
> and then write it out with SeqIO in GenBank format. If all your features have
> trivial locations, this is pretty easy.
> 

Thanks for this. I've managed to get this to work, but encountered a few 
minor issues.

I already have GenBank files created by CLC Genomics Workbench 3 but I 
want to make these in a script. The CLC generated GenBank files look 
like this:

LOCUS       Setd2-tagged           11750 bp    DNA     linear   UNA
FEATURES             Location/Qualifiers
      misc_feature    1..50
                      /label="Subcloning HA Upstream"
...(snip other features)

ORIGIN
         1 TTGGTGTGAG CTCTTTGTGT CTTGCCTAAG TATGTGCATC TGTCTTGTCT

...(snip sequence)


To do this in biopython, I need to create my feature thus:

sf = SeqFeature.SeqFeature(SeqFeature.FeatureLocation(0,50), 
type="misc_feature", qualifiers = { "label" : [ "Subcloning HA Upstream" ]})

The issues I had were:

- In the docstring for SeqFeature, it says the attribute is "qualifier" 
but it should be "qualifiers".

- My first stab at the qualifiers argument was to do

qualifiers = { "label" : "mylabel" }

but if I do that, it iterates over "mylabel" giving me one "label" for 
each character! Maybe the qualifier printer should check it's being 
given a list and not a string?

- I'd like to remove some of the extraneous header from the GenBank file:

DEFINITION  .
ACCESSION   <unknown id>
VERSION     <unknown id>
KEYWORDS    .
SOURCE      .
   ORGANISM  .
             .

Is this possible?

Sorry for the long message,

Peter



More information about the Biopython mailing list