[Bioperl-l] help using SEARCH IO to parse blast results

dennis prickett (IAH-C) dennis.prickett at bbsrc.ac.uk
Wed Nov 28 10:18:26 UTC 2007


Dear Alison
 
Or, if you are absolutely only interested in the top hit you could limit
it to that in the blast  command by adding the parameters  " -b 1 ".  

This will truncate the report to 1 hsp per query (or -b 5 for 5 hsps,
etc).  Your blasts run faster and then you won't have to worry about how
to parse out the top blast hit(s).

However, if there are any caveats for using this parameter that I am not
aware of please let us know. 

Dennis Prickett
Institute of Animal Health
Compton, nr Newbury
RG2 9FS
United Kingdom



-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of alison waller
Sent: 26 November 2007 21:07
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] help using SEARCH IO to parse blast results

Hello all,

 

It's the usual story, I'm an engineer turned biologist who now needs
help with bioinformatics so I can analyze huge amounts of data to finish
my thesis.  

 

I am trying to write a script that will parse large blast files (usually
blastx) I also want to be able to specify how many hits I want to report
information on.

Most of the time I will only want information on the top hit, but I want
to have the flexibility to obtain information on say the top5.  I am
pretty sure I have done this wrong, any advice on how to correct my
script to do this, would be great.

 

Thanks so much,

 

Alison

 

 

#!/usr/local/bin/perl -w

 

# Parsing BLAST reports with BioPerl's Bio::SearchIO module

# alison waller November 2007

use strict;

use warnings;

use Bio::SearchIO;

 

# to run type: blast_parse_aw.pl input.txt #of hits

 

my $infile =shift(@ARGV);

my $outfile ="$ARGV[0].parsed";

my $tophit = $ARGV[1]; # I want to specify in the command line how many
hits to report for each query

 

open (IN,"$ARGV[0]") || die "Can't open inputfile $ARGV[0]! $!\n";

open (OUT,">$outfile");

 

$report = new Bio::SearchIO(

         -file=>"$inFile",

              -format => "blast"); 

 

print
"Query\tHitDesc\tHitSignif\tHitBits\tEvalue\t%id\tAlignLen\tNumIdent\tga
ps\t
Qstrand\tHstrand\n";

 

# Go through BLAST reports one by one              

while($result = $report->next_result) 

{

      if ($top_hit=$result->next_hit) # this might be wrong - I want to
specify how many hits to print results for

            # Print some tab-delimited data about this hit

           { 

            print $result->query_name, "\t";

            print $hit->description, "\t";

            print $hit->significance, "\t";

            print $hit->bits,"\t";    

            print $hsp->evalue, "\t";

            print $hsp->percent_identity, "\t";

            print $hsp->length('total'),"\t";

            print $hsp->num_identical,"\t";

            print $hsp->gaps,"\t";

            print $hsp->strand('query'),"\t";

            print $hsp->strand('hit'), "\n";

          }

      else print "no hits\n";

   } 

 

 

 

            

 

 

 

 

******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller at chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  

 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list