[Bioperl-l] Small word sizes with BLAST (WU, NCBI)

Andrew Walsh paeruginosa at hotmail.com
Wed Mar 3 14:51:59 EST 2004


Hello,

My question is not really related to a specific Bioperl library, so I 
apologize.  If there is a specific 'BLAST' newsgroup, I will be happy to 
post there.  But I was hoping somebody on the Bioperl list had some 
experience doing nucleic acid searches with small word sizes.

I would like to search for  small (5-7) bp matches between an oligo sequence 
and a ~100,000 mRNA database.  I've tried doing this with WU-BLAST and 
NCBI-BLAST.  NCBI-BLAST does not allow word sizes below 7, so I've tried 
lots of different command line parameters for WU-BLAST.

I've tried these searches with versions 2.0a19 (alpha) and 2.0 of WU-BLAST.

I get quite strange results when I start lowering the word size below the 
default (11).  For example, with the alpha version, I get more hits with a 
word size of 10 than I do with a word size of 7.  With the beta version, I 
get the same number of hits with word sizes 10 and 7.  I've checked this by 
hand, and the 'missing' hits do in fact have stretches of 7 continuous bps 
matching.

Here is an example of one of the command lines I've tried running:
blastn human_refseq.fasta seq3.fasta W=5 S=5 M=1 V=100000 B=100000

I've tried adjusting every parameter I thought would affect the search 
results, but still cannot  recover the 'missing' hits.

Maybe BLAST is the wrong tool for this.  I'd just like something that's 
fast.  If anyone has some advice, it would be greatly appreciated.

Thanks a lot,

Andrew

_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=dept/features&pgmarket=en-ca&RU=http%3a%2f%2fjoin.msn.com%2f%3fpage%3dmisc%2fspecialoffers%26pgmarket%3den-ca



More information about the Bioperl-l mailing list