[Bioperl-l] limit the number of blast output per query

Frank Schwach fs5 at sanger.ac.uk
Thu Jul 14 15:08:46 UTC 2011


-b limits the number of output alignments, not sure why it isn't working
for you. How many HSPs do you actually get? Is there a chance that the
output is just so large because of a large number of queries and you are
not seeing 10k+ HSPs but results? 
Can you post the entire blastall command? The large number of hits looks
like you are blasting short sequences, there are better algorithms for
short sequence alignments, so if that's what you are doing I could give
you some hints for alternative software to try.
Frank



On Thu, 2011-07-07 at 11:02 +0200, Hans-Rudolf Hotz wrote:
> Hi
> 
> just double checking: are you really talking abut "10,000+ hits"? or do 
> you mean "10,000+ HSPs" ('high-scoring segment pairs')?
> 
> I don't know how your genome database looks like, but assuming you have 
> one sequence per chromosome, then you will get just 24 hits (ie each 
> chromosome) and then depending on your query each hit will have a lot of 
> HSPs.
> 
> As far as as I know, there is no way to limit the number of HSPs (you 
> might try playing with the E value).
> 
> You can try using the tabular output format (this will reduce the file 
> size) - or may be BLAST is not the right search tool for your task?
> 
> 
> Regards, Hans
> 
> 
> On 07/07/2011 04:43 AM, Ross KK Leung wrote:
> > I know this question should submit to BLAST help but it seems they have
> > already been overwhelmed by incoming emails. I wonder any bioperl users
> > happen to know how to limit the number of blast output per query. For
> > example, for human genome as a database to blast against, a single query can
> > generate 10,000+ hits. I have already supplied -b 30 -v 30 flags but
> > obviously the blastall from blast2.2.22 does not "obey" my instruction.
> >
> > The output files generated are usually larger than 100G+ but indeed the
> > final ones that I want usually are only of 10M-. Is there any way to help
> > save our Earth (Not exaggerated, energy is WASTED in a meaningless manner)?
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list