[Bioperl-l] Parse BLAST to find mismatches

Ken Graham kjgraham@ucdavis.edu
Wed, 29 May 2002 15:03:05 -0700


Thanks for the help Jason . That does what I wanted.
I'm still learning Perl and BioPerl so I tried to use the "seq_inds" method 
of the high scoring pair to develop a better feel for BioPerl objects. 
Below is a code snippet that is giving me problems. I get the following error:
Can't locate object method "seq_inds" via package 
"Bio::Search::HSP::GenericHSP"
(perhaps you forgot to load "Bio::Search::HSP::GenericHSP"?) at 
blast_parser_sm line 16,
<GEN1> line 2661.
I tried this code on another computer and got the same results.
Am I forgetting something?
Thanks in advance,
#!/usr/bin/perl -w
use Bio::SearchIO;

$blastfile ='/home/ken/blastFile';
$searchio = new Bio::SearchIO ('-format' => 'blast',
                                 '-file'   => $blastfile );
$result = $searchio->next_result;
$result->database_name;

while( $hit = $result->next_hit()) {
         $hit_name = $hit->name ;
         while ($hsp = $hit->next_hsp()) {
                 $hsp_percent=$hsp->percent_identity;
                 $hsp_len=$hsp->length();
                 $hsp_start = $hsp->start();
                 @h_ind = $hsp->seq_inds('hit', 'conserved', 1);
                 if ($hsp_percent > 85 and $hsp_len > 25) {
                         print "Hit Name is $hit_name\n";
                         print "Length is $hsp_len\n";
                         print "Starting at $hsp_start\n";
                         print "Matches at @h_ind\n";
                         print "Percent equals $hsp_percent\n";
                 }
         }
}

> >> I'm stuck on what might be a simple problem. Using a BLAST report I 
> want to
> >> find the position, on the query sequence, of each mismatch in a HSP. I can
> >> read in the BLAST report, and get properties on the HSP (such as start,
> >> percent similarity etc..) but I don't know where to find any info 
> about the
> >> mismatches. I tried to use the SimpleAlign object but I just get errors.
>
> >You can call $hsp->homology_seq to determine where the mismatches are.
> >
> >The mismatches will be space so you can use the index method to locate
> >where the mismatches are.  The following might get you what you want.
> >
> >my $in = new Bio::SearchIO(-file => 'BLASTFILE',
> >                          -format => 'blast');
> >
> >while( my $result = $in->next_result ) {
> >   while( my $hit = $result->next_hit ) {
> >       while( my $hsp = $hit->next_hsp ) {
> >           my $hseq = $hsp->homology_string;
> >           $last = 0;
> >           my @a;
> >           while( ($last = index($hseq, ' ', $last)) > 0 ) {
> >               # last is the location of the mismatch
> >               # NOTE: $last is index in string coordinates
> >               # but Bio::Seq objects start with 1 not 0.
> >               # What you'll probably want to do is:
> >               print "base is ", substr($hsp->query_string, $last,1),
> >                     " index is $last\n";
> >               $last++;
> >           }
> >           print $hit->accession, " ", join(",",@a), "\n" if(@a);
> >       }
> >  }
> >    last;
> >}