[Bioperl-l] help using SEARCH IO to parse blast results

Chris Fields cjfields at uiuc.edu
Tue Nov 27 21:00:33 UTC 2007


The hits/HSPs are generally in the order they appear in the report.

If you are looking for best/worst HSP after parsing you can use the  
$hit->hsp() method:

# best and worst
my $best = $hit->hsp('best'); # also 'first'
my $worst = $hit->hsp('worst'); # also last

The SearchIO text BLAST parser also has several options implemented  
for finer control:

     -inclusion_threshold => e-value threshold for inclusion in the
                             PSI-BLAST score matrix model (blastpgp)
     -signif      => float or scientific notation number to be used
                     as a P- or Expect value cutoff
     -score       => integer or scientific notation number to be used
                     as a blast score value cutoff
     -bits        => integer or scientific notation number to be used
                     as a bit score value cutoff
     -hit_filter  => reference to a function to be used for
                     filtering hits based on arbitrary criteria.
                     All hits of each BLAST report must satisfy
                     this criteria to be retained.
                     If a hit fails this test, it is ignored.
                     This function should take a
                     Bio::Search::Hit::BlastHit.pm object as its first
                     argument and return true
                     if the hit should be retained.
                     Sample filter function:
                        -hit_filter => sub { $hit = shift;
                                             $hit->gaps == 0; },
                     (Note: -filt_func is synonymous with -hit_filter)
     -overlap     => integer. The amount of overlap to permit between
                     adjacent HSPs when tiling HSPs. A reasonable  
value is 2.
                     Default = $Bio::SearchIO::blast::MAX_HSP_OVERLAP.

chris

On Nov 27, 2007, at 1:31 PM, Smithies, Russell wrote:

> Do the hits need to be sorted first or is this done automagicly?
> I ask this as I know Megablast doesn't provide sorted output for  
> most of
> it's formats.
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Dave Messina
>> Sent: Wednesday, 28 November 2007 6:56 a.m.
>> To: alison waller
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] help using SEARCH IO to parse blast results
>>
>> Hi Alison,
>> As Sendu mentioned, the key bit is adding a condition to the hit loop
> to
>> limit the number of hits that are printed. I didn't test the below
>> extensively, but give it a try...
>>
>>
>> Dave
>>
>>
>>
>> #!/usr/local/bin/perl -w
>>
>> # Parsing BLAST reports with BioPerl's Bio::SearchIO module
>> # alison waller November 2007
>>
>> use strict;
>> use warnings;
>> use Bio::SearchIO;
>>
>> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of
> hits>\n";
>> if (@ARGV != 2) { die $usage; }
>>
>> my $infile  = $ARGV[0];
>> my $outfile = $infile . '.parsed';
>> my $tophit  = $ARGV[1]; # to specify in the command line how many  
>> hits
>>                        # to report for each query
>>
>> #open(  IN, $infile )     || die "Can't open inputfile $infile! $! 
>> \n";
>> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile!
> $!\n";
>>
>> my $report = new Bio::SearchIO(
>>    -file   => "$infile",
>>    -format => "blast"
>> );
>>
>> print OUT
>>
> "Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent 
> \tga
> ps\t
>> Qstrand\tHstrand\n";
>>
>> # Go through BLAST reports one by one
>> while ( my $result = $report->next_result ) {
>>    my $i = 0;
>>    while (  ( $i++ < $tophit ) && (my $hit = $result->next_hit) ) {
>>        while ( my $hsp = $hit->next_hsp ) {
>>
>>            # Print some tab-delimited data about this hit
>>            print OUT $result->query_name,     "\t";
>>            print OUT $hit->name,              "\t";
>>            print OUT $hit->significance,      "\t";
>>            print OUT $hit->bits,              "\t";
>>            print OUT $hsp->evalue,            "\t";
>>            print OUT $hsp->percent_identity,  "\t";
>>            print OUT $hsp->length('total'),   "\t";
>>            print OUT $hsp->num_identical,     "\t";
>>            print OUT $hsp->gaps,              "\t";
>>            print OUT $hsp->strand('query'),   "\t";
>>            print OUT $hsp->strand('hit'),     "\n";
>>        }
>>    }
>>
>>    if ($i == 0) { print OUT "no hits\n"; }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list