[Bioperl-l] BLAST and parsing question
Torsten Seemann
torsten.seemann at infotech.monash.edu.au
Tue Apr 25 01:55:52 EDT 2006
Robert,
>> So you want all length 20 subsequences (derived using a sliding window
>> from some set of sequences) which are do not appear in some other set of
>> sequences (virus-db) ?
> Yes, that's basically it. Find out which 20 unit long subsequences of
> my sequence are not found in my database.
Well, using BLAST is probably not the most appropriate tool for this
problem as it will find 'high scoring' matches, not exact matches.
Perhaps simply using Perl's "index()" function, which tests if one
string is in another string, would be simpler?
You could even concatenate all your database sequences into one big
sequence, inserting 20 "N" (if DNA) or "X" (if nucletotide) between each
(or any other char you don't have in your sequences). Then you could
simply loop through your 20-length subsequences using the sliding window
as before, and do a "index()" for each against the one big database
string. If index() returns a negative value, it wasn't found.
Hope this helps,
Torsten Seemann.
More information about the Bioperl-l
mailing list