[BioPython] Genbank dictionary question

Brad Chapman chapmanb at uga.edu
Sun May 11 16:34:44 EDT 2003


Hi Ashleigh;

> My main problem is that I'm unable to actually retrieve the Genbank
> record, instead I get <Bio.SeqRecord.SeqRecord instance at 0x83b01f4>
> (is that where those data are actually stored on the hard drive?) 

This is a SeqRecord object with all of the information for the
GenBank file already parsed. The SeqRecord is a generic sort of
representation for a sequence with features. Section 3.7.1 of the
Tutorial describes what a SeqRecord is made up of.

> Why doesn't gb_dict['my_key'] give me the record corresponding to that
> accession number?  

> >>> dict_file='genbank_file'
> >>> index_file='genbank_file.idx'
> >>> GenBank.index_file(dict_file, index_file)
> >>> gb_dict=GenBank.Dictionary(index_file, GenBank.FeatureParser())
[...]
> >>> gb_dict['AJ299393']
> <Bio.SeqRecord.SeqRecord instance at 0x83b01f4>

When you create the gb_dict, you do with with the FeatureParser(),
which is why you get SeqFeature objects. You have your choice of
what kind of info you get back:

gb_dict = GenBank.Dictionary(index_file)

will give you the raw unparsed text records, while

gb_dict = GenBank.Dictionary(index_file, GenBank.RecordParser())

gives you GenBank specific Record objects. Section 3.4.2 of the
tutorial describes a bit more about the different parsers and a
couple of the difference between RecordParser()s and
FeatureParser()s.

The choice of what kind of output you want to deal with is really up
to you.

> Also, I'm unclear on how to work with the SeqRecord objects in the
> context of my dictionary.  

Depending on which parser you choose to use, you can just deal with
the info you get back to extract what you want. For instance, if you wanted
to do something like store the id and sequence of the SeqRecord
object, you could do:

rec = gb_dict['AJ299393']
print rec.id, rec.seq

Hopefully this helps!
Brad


More information about the BioPython mailing list