[BioPython] Parsing and Creating Dictionaries of GenBank files
Marc Colosimo
mcolosimo at mitre.org
Thu Apr 20 14:23:19 UTC 2006
While we are on the subject of parsing multiple GenBank files and the
Cookbook, I think a better example (and more pythonish) is the
following:
from Bio import GenBank
gb_file = "my_file.gb"
gb_handle = open(gb_file, 'r')
feature_parser = GenBank.FeatureParser()
gb_iterator = GenBank.Iterator(gb_handle, feature_parser)
for cur_record in gb_iterator:
# now do something with the record
print cur_record.seq
which is way nicer (and uses iterators as per pep-234 and ) than
while 1:
cur_record = gb_iterator.next()
if cur_record is None:
break
# now do something with the record
print cur_record.seq
Actually, the above works with the Fasta iterator as well.
Times for a GenBank file with 72,358 records (LOCUSs):
my way (using iterators): 14m16.886s
cookbook way (using next and if): 14m28.547s
Surprisingly, this isn't much faster (maybe with -O it would be)
Marc
On Apr 20, 2006, at 8:42 AM, Peter (BioPython) wrote:
> Pepe Barbe wrote:
>> Hello,
>>
>> Following the simple steps in the BioPython cookbook, I wanted to
>> create a dictionary with the following GenBank file:
>>
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/
>> NC_000913.gbk
>>
>> Below you can find what I tried executing and the error I got. I
>> would
>> appreciate any insight into solving the error and correctly producing
>> the dictionary.
>
> The cookbook tutorial is a little misleading in that regard.
> Indexing a
> GenBank file only makes sense for those files with multiple genbank
> record (i.e. multiple LOCUS lines).
>
> For example, you can get multi-record GenBank files with records for
> different genes. These tend to be small records, and the Martel based
> indexing code copes fine. It doesn't cope very well with large
> records
> like genomes.
>
> Your example (and in my experience all Bacterial Genomes) have just a
> single very large record (which will contain many features).
>
> Does this page help?
>
> http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/
> python/genbank/
>
> I did suggest a change to the documentation but it looks like no
> one has
> made the change...
>
> http://biopython.org/pipermail/biopython-dev/2005-November/002193.html
>
> I had forgotten to chase this up.
>
> Peter
>
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list