[BioPython] Genbank dictionary question

ashleigh smythe absmythe at ucdavis.edu
Fri May 9 23:18:01 EDT 2003


Hello!  Biologist/ newbie programmer here!  I'm still working through
the biopython tutorial on bioinformatics.org.  

I'm trying to make a dictionary from a file of Genbank entries.  It
seems like I did make the dictionary and the GenBank.Dictionary module
automatically used accession number as the keys(?).  In a previous
attempt I modified the tutorial suggestions for the fasta dictionary so
I did define the keys as accession numbers previously but I don't think
I'm using that in this attempt.  

My main problem is that I'm unable to actually retrieve the Genbank
record, instead I get <Bio.SeqRecord.SeqRecord instance at 0x83b01f4>
(is that where those data are actually stored on the hard drive?) or
when I try .get I retrieve 2 numbers (37568, 2952) which have no meaning
to me!  

Why doesn't gb_dict['my_key'] give me the record corresponding to that
accession number?  

Also, I'm unclear on how to work with the SeqRecord objects in the
context of my dictionary.  

Below is my code - anything that could clear things up for me would be
greatly appreciated!



>>> dict_file='genbank_file'
>>> index_file='genbank_file.idx'
>>> GenBank.index_file(dict_file, index_file)
>>> gb_dict=GenBank.Dictionary(index_file, GenBank.FeatureParser())
>>> len(gb_dict)
26
>>> gb_dict.keys()
['AJ299393', 'L25104', 'AB067755', 'AJ004941', 'L25102', 'L25103',
'L25101', 'AJ512744', 'AJ131065', 'L78909', 'AJ290428', 'X91508',
'AJ003190', 'X60192', 'AJ428049', 'L78886', 'X95684', 'AJ400338',
'AJ318697', 'AJ512725', 'U01208', 'X76135', 'AJ252249', 'M94896',
'L78852']
>>> gb_dict.get('AJ299393')
(37568, 2952)
>>> gb_dict['AJ299393']
<Bio.SeqRecord.SeqRecord instance at 0x83b01f4>


Ashleigh




More information about the BioPython mailing list