[BioPython] Parsing and Creating Dictionaries of GenBank files

Marc Colosimo mcolosimo at mitre.org
Thu Apr 20 14:23:19 UTC 2006


While we are on the subject of parsing multiple GenBank files and the  
Cookbook, I think a better example (and more pythonish) is the  
following:

from Bio import GenBank

gb_file = "my_file.gb"
gb_handle = open(gb_file, 'r')

feature_parser = GenBank.FeatureParser()

gb_iterator = GenBank.Iterator(gb_handle, feature_parser)

for cur_record in gb_iterator:
    # now do something with the record
    print cur_record.seq

which is way nicer (and uses iterators as per pep-234 and ) than

while 1:
    cur_record = gb_iterator.next()

    if cur_record is None:
        break

    # now do something with the record
    print cur_record.seq

Actually, the above works with the Fasta iterator as well.

Times for a GenBank file with 72,358 records (LOCUSs):
my way (using iterators): 14m16.886s
cookbook way (using next and if):  14m28.547s

Surprisingly, this isn't much faster (maybe with -O it would be)

Marc

On Apr 20, 2006, at 8:42 AM, Peter (BioPython) wrote:

> Pepe Barbe wrote:
>> Hello,
>>
>> Following the simple steps in the BioPython cookbook, I wanted to
>> create a dictionary with the following GenBank file:
>>
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/ 
>> NC_000913.gbk
>>
>> Below you can find what I tried executing and the error I got. I  
>> would
>> appreciate any insight into solving the error and correctly producing
>> the dictionary.
>
> The cookbook tutorial is a little misleading in that regard.   
> Indexing a
> GenBank file only makes sense for those files with multiple genbank
> record (i.e. multiple LOCUS lines).
>
> For example, you can get multi-record GenBank files with records for
> different genes.  These tend to be small records, and the Martel based
> indexing code copes fine.  It doesn't cope very well with large  
> records
> like genomes.
>
> Your example (and in my experience all Bacterial Genomes) have just a
> single very large record (which will contain many features).
>
> Does this page help?
>
> http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/ 
> python/genbank/
>
> I did suggest a change to the documentation but it looks like no  
> one has
> made the change...
>
> http://biopython.org/pipermail/biopython-dev/2005-November/002193.html
>
> I had forgotten to chase this up.
>
> Peter
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython




More information about the Biopython mailing list