[BioPython] About the tutorial: Fasta files as Dictionaries

Brad Chapman chapmanb@arches.uga.edu
Thu, 5 Jul 2001 16:39:14 -0400


Hi Quoc-Dien;

> I was wondering if we could use another "function_to_get_index_key than
> "get_accession_number". 

Definately! The "get_accession_number" function in the Tutorial is
just meant to be an example of the type of function you can write to
index the files by. In general, the purpose of the
"function_to_get_index_key" argument to index_file is to allow
flexibility in indexing the FASTA file.

So, just write another function insted of get_accession_number, and
pass it in as the third argument to index_file:

def my_new_function(fasta_record):
    # process the record however you'd like in order to get your key

    # return the key

index_file("my_file", "my_index_file", my_new_function)

A reference to the function (ie. the name of the function) is passed
as the third argument argument

> The accession number is a great way to index
> sequences, but the problem with it is that sometimes, multiple sequences
> files can have the accession number, and the python script cannot
> work.

We totally agree! That's supposed to be the general idea behind being
able to pass in any function at all.

> Therefore, I would like to index the sequences using their GenBank Id
> instead. Is it possible?

Yup, the example was hoping to get across both:

-> That it is possible to do this.

-> An example of how to do this.

I'll go back and look at that section again to see if I can make it
more clear. Do you have any suggestions that would make it clearer for
you?

Hope this helps. Please don't hesitate to ask again if you have more
questions! 

Brad