[BioPython] fasta retrieval

Fri Aug 8 18:29:43 EDT 2003

Hi Marty;

> I need to extract 702 sequences from a database of about 25000 fasta sequences. 
> Does anyone have a script for that?

It should be easy enough to write one. With Biopython, and assuming
the 702 sequence titles you want are in a file each on there own
line, it would look something like:

from Bio import Fasta

input_handle = open("your_seqs.txt")
interest_titles = []
for line in input_handle.xreadlines():
    interest_titles.append(line)
input_handle.close()

fasta_handle = open("fasta_database.fasta")
parser = Fasta.RecordParser()
iterator = Fasta.Iterator(fasta_handle, parser)
output_handle = open("your_seqs.fasta", "w")
while 1:
    rec = iterator.next()
    if not rec:
        break
    if rec.title in interest_titles:
        output_handle.write(str(rec) + "\n")
fasta_handle.close()
output_handle.close()

That's off the top of my head in a mail client so no promises that
it works exactly, but it should give the idea. Hopefully this helps.

Brad