[Bioperl-l] what's the optimal way to search a fasta file for matching ID's?

Joseph Fass joseph.fass at gmail.com
Thu Oct 25 21:50:02 UTC 2007


I would appreciate any advice, big or small, on this ...

I've got a decent-sized database ... 90,000 sequences or so in a single
fasta-format file.  Then, I've got sequence ID's from that database that
show up in blast reports.  I want to collect those ID's and their sequences
(for the purposes of exploring possible contigs).  Since the blast report
only includes sub-sequences (from alignments) of my sequences, I want to
parse the report, then match each hit ID against an ID in the database, so I
can pull out its full sequence.  Is there a faster way to do this than
opening the database file each time I have a new hit ID, so I can search it
from beginning to end?  If I push each sequence onto a list or hash, it's
liable to chew up a lot of RAM, I'm guessing.  Any suggestions?

Thanks in advance,
~joe

-- 
Joseph Fass
joseph.fass at gmail.com  ||  joefass at hotmail.com
970.227.5928 (c)  ||  530.754.7978 (w)



More information about the Bioperl-l mailing list