[BioPython] GenBank file parsing question

Mon Nov 17 14:48:22 EST 2003

All -

I have been using BioPython to parse GenBank files. The fields that I
have been parsing include the accession number, the date, and the
sequence. I have then output these into Fasta format for input into a
sequence analysis program. I also want to grab the country of origin and
the strain. I think these are available to be parsed as they are
documented in Bio/GenBank/genbank_format.py under a section called:
feature_key_names however I can't figure out the syntax to call this
part after having loaded in my file and parsed it like this:

#!/usr/bin/env python
from Bio import GenBank,Writer
import sys,string
def GenBank2FASTA():
   parser = GenBank.RecordParser()
   genbankfile = open(sys.argv[1],'r')
   iterator = GenBank.Iterator(genbankfile,parser)
   while 1:
      cur_record = iterator.next()
      if cur_record is None:
	break
      outfile = open(sys.argv[2],'a')
      cur_record.accession = " ".join(cur_record.accession)
      outfile.write('>%s|%s\n' %(cur_record.accession,cur_record.date))
      seqwidth = 60
      for i in range(0,len(cur_record.sequence),seqwidth):
	outfile.write('%s\n' % cur_record.sequence[i:i+seqwidth])
		etc.,etc.,

This results in a file that looks like this:
>AAQ10924|24-SEP-2003
MAGRSGDDDKELLKAVKIIKILYQSNPYPEPKGSRQARKNRRRRWRARQRQIDSISERIL
STCLGRPTEPVPLQLPPLERLHLDSREDCGTSGTQQSQGVETGVGRPQISVESSGVLGSR
TET
>AAQ10915|24-SEP-2003
MAGRSGDNDEELLKAVRIIKILYKSNPYPEPKGSRQARKNRRRRWRARQRQIDSISERIL
STYLGRSTEPVPLQLPPLERLHLDCREDCGTSGTQQSQGVETGVGRPQISVESPVILGSR
TKN
>AAQ10906|24-SEP-2003
MAGRSGDGDEGILPTVKIIQILYPSHPYPEPKGSRQARKNRRRRWRARQKQIDSISERIL
STCLGRPAEPVPLQLPPLERLHLDSREDCGTSGTQQSQGVETGVGRPQISVESSGVLGSR
TET

What I'd like is have this plus:
>AAQ10906|24-SEP-2003 | country | strain
MAGRSGDGDEGILPTVKIIQILYPSHPYPEPKGSRQARKNRRRRWRARQKQIDSISERIL
STCLGRPAEPVPLQLPPLERLHLDSREDCGTSGTQQSQGVETGVGRPQISVESSGVLGSR
TET

I'm sure there's a pretty simple solution to my problem. Any help would
be appreciated either in terms of using BioPython to parse the country
of origin and strain fields.

Lee Shekter