[BioPython] Deprecating Fasta.Dictionary, GenBank.Dictionary

Michiel De Hoon mdehoon at c2b2.columbia.edu
Thu Sep 13 05:13:29 UTC 2007


Hi everybody,

In the preparation for the upcoming Biopython release, we noticed some
serious problems when using the latest version (3.0) of mxTextTools. We were
already able to fix several of them, but some Biopython tests still fail with
the new mxTextTools. One of the tests that fails is test_Fasta.py. The part
of the test that fails is related to creating a Fasta Dictionary. This is not
explicitly described in the Tutorial, but it is essentially the same as
creating a Genbank dictionary, which is explained in section 4.3.4 in the
Tutorial.

Quoting from the tutorial:
>>> from Bio import GenBank
>>> dict_file = 'cor6_6.gb'
>>> index_file = 'cor6_6.idx'
>>> GenBank.index_file(dict_file, index_file)
>>> gb_dict = GenBank.Dictionary(index_file, GenBank.FeatureParser())
>>> len(gb_dict)
>>> gb_dict.keys()
['L31939', 'AJ237582', 'X62281', 'AF297471', 'M81224', 'X55053']
>>> gb_dict['AJ237582']
<Bio.SeqRecord.SeqRecord instance at 0x102fdd8c>


The same can also be obtained with the new Bio.SeqIO code:

>>> from Bio import SeqIO
>>> records = SeqIO.parse(open('cor6_6.gb'), 'genbank')
>>> gb_dict = {}
>>> for record in records:
...     key = record.id.split(".")[0]
...     gb_dict[key] = record
...
>>> gb_dict.keys()
['M81224', 'AF297471', 'X62281', 'AJ237582', 'L31939', 'X55053']
>>> # etcetera

(you can also use the to_dict function in Bio.SeqIO). The same can also be
done for Fasta.

So, I'd like to deprecate the index_file functions where Bio.SeqIO can be
used instead, in particular for Fasta. Then, we can remove that particular
test from test_Fasta. Would that cause problems for anybody? Given the new
Bio.SeqIO code, does anybody still need to use the index_file functions? 

--Michiel.




Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032





More information about the Biopython mailing list