[BioPython] help with retrieving seq

Brad Chapman chapmanb@arches.uga.edu
Thu, 1 Mar 2001 19:21:43 -0500


Hi Dinakar;

[Finding records in FASTA files]
> It works well for the sequences that are closer to start of file but if
> the sequence is towards the end, it takes almost forever ( i mean it is
> slow). 

Yup, definately true -- if you have really big files, this probably
isn't the best approach.

> Is there any indexing technique. I was thinking, I should create
> some sort of index because I will be doing this quite often and that way
> search can be really fast. Or is there any efficient method of searching
> EST database. Does any one has any suggestion regarding indexing.

You probably want to check out the next section in the Tutorial: 

2.4.4. FASTA files as Dictionaries

The example there is actually of indexing a FASTA file using accession 
numbers. This sounds really close to what you need. Let us know if you 
have problems modifying the example to fit in your actual case. BTW,
the example code is in Doc/examples/fasta_dictionary.py if you want to 
start from that.

Hope this helps,
Brad