[BioPython] Parsing and Creating Dictionaries of GenBank files

Peter (BioPython) biopython at maubp.freeserve.co.uk
Thu Apr 20 12:42:34 UTC 2006


Pepe Barbe wrote:
> Hello,
> 
> Following the simple steps in the BioPython cookbook, I wanted to
> create a dictionary with the following GenBank file:
> 
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/NC_000913.gbk
> 
> Below you can find what I tried executing and the error I got. I would
> appreciate any insight into solving the error and correctly producing
> the dictionary.

The cookbook tutorial is a little misleading in that regard.  Indexing a 
GenBank file only makes sense for those files with multiple genbank 
record (i.e. multiple LOCUS lines).

For example, you can get multi-record GenBank files with records for 
different genes.  These tend to be small records, and the Martel based 
indexing code copes fine.  It doesn't cope very well with large records 
like genomes.

Your example (and in my experience all Bacterial Genomes) have just a 
single very large record (which will contain many features).

Does this page help?

http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/genbank/

I did suggest a change to the documentation but it looks like no one has 
made the change...

http://biopython.org/pipermail/biopython-dev/2005-November/002193.html

I had forgotten to chase this up.

Peter




More information about the Biopython mailing list