[Bioperl-l] what's the optimal way to search a fasta file formatching ID's?
Jason Stajich
jason at bioperl.org
Fri Oct 26 04:57:52 UTC 2007
or see Bio::DB::Fasta if you want a bioperl soln.
On Oct 25, 2007, at 6:17 PM, Cook, Malcolm wrote:
> If you have the fasta database already indexed for blast searching,
> then
> you should use fastacmd, which comes with the blast package, for
> extracting (sub)sequences based on ID (and indices).
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Joseph
>> Fass
>> Sent: Thursday, October 25, 2007 4:50 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] what's the optimal way to search a fasta
>> file formatching ID's?
>>
>> I would appreciate any advice, big or small, on this ...
>>
>> I've got a decent-sized database ... 90,000 sequences or so
>> in a single fasta-format file. Then, I've got sequence ID's
>> from that database that show up in blast reports. I want to
>> collect those ID's and their sequences (for the purposes of
>> exploring possible contigs). Since the blast report only
>> includes sub-sequences (from alignments) of my sequences, I
>> want to parse the report, then match each hit ID against an
>> ID in the database, so I can pull out its full sequence. Is
>> there a faster way to do this than opening the database file
>> each time I have a new hit ID, so I can search it from
>> beginning to end? If I push each sequence onto a list or
>> hash, it's liable to chew up a lot of RAM, I'm guessing. Any
>> suggestions?
>>
>> Thanks in advance,
>> ~joe
>>
>> --
>> Joseph Fass
>> joseph.fass at gmail.com || joefass at hotmail.com
>> 970.227.5928 (c) || 530.754.7978 (w)
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason at bioperl.org
More information about the Bioperl-l
mailing list