[Bioperl-l] trying to save blast hit sequences to fasta file

Malay mbasu at mail.nih.gov
Fri Aug 3 18:55:57 UTC 2007


Jay Hannah wrote:
> Torsten Seemann wrote:
>>> Hi, thanks for your help and suggestions. I have tried the example code
>>> of Jay Hannah and it works perfectly. But what I need to save in fasta
>>> format is the whole sequence in the database that is similar to my query
>>> sequence.
>>>     
>> Unfortunately the hit_string is only that part of the sequence in the
>> database that was similar enough to your query sequence. The BLAST
>> report does not have the whole hit sequence in it, only the locally
>> aligned part. SearchIO can only give you what it can get from the
>> BLAST report.
>>
>> You will need to record the IDs of the database sequences you are
>> interested in, and write extra code to retrieve the WHOLE hit sequence
>> from your database.

I am not sure whether it has already been suggested or not but you can 
retrieve the full sequence from any blast database using "fastacmd", 
which is part of NCBI toolbox. Parse the "description" string from from 
the BLAST report and run:

fastacmd -d <database file> -s <description>

where, the argument of -s can be any unique string for the database.

-Malay



More information about the Bioperl-l mailing list