[Bioperl-l] Problems parsing blast reports

Lorenzo Carretero lcpaulet at googlemail.com
Sat May 14 15:29:12 UTC 2011


Hi all,
I'm trying to parse blasttable '-m 8 format' reports from whole intra-genome
comparisons of all vs all to get the best non-self hit and diverged best
non-self hit as separate hashes (key=query=>value=non_self_hit). The
following script seems to run ok but it returns unexpected results (i.e. it
doesn't catch the best hit but a random hit apparently). I assume hits are
iterated over the while loop (while (my $hit = $result->next_hit) as
returned in the blast results (i.e. sorted by hits bit scores).
Any help would be much appreciated.
Cheers,
Lorenzo


my ( $filename, $format, $minimumbitsscore, $thresholdevalue,
> $minimumidentity, $maximumredundancy, $minimumalnlength ) = @_;
> my ( %besthits, %diverged, %redundant ,%genefusion ) = ();
> my ( $refbh, $refdiv, $cb, $cd, $refred, $reffus, $cr, $cf);
> my $total = 0;
> my $in = new Bio::SearchIO (     -file   => $filename,
>                                  -format => $format,
>                                  #-verbose => -1,
>                                   )
>                                  or die "No $filename BLAST file with
> $format found";
>       while( my $result = $in->next_result)    {
>       $total++;
>           while (my $hit = $result->next_hit)    {
>           my $query = $result->query_name();
>           my $hitname = $hit->name();
>           my $bits = $hit->bits();
>           my $evalue = $hit->significance();
>             if ($query ne $hitname and $bits >= $minimumbitsscore and
> $evalue <= $thresholdevalue )    {
>                 $besthits{ $query } = $hitname;
>             }
>             elsif    ( $bits <= $minimumbitsscore and $evalue >=
> $thresholdevalue){
>                 $diverged { $query } = $hitname;
>             }
>
> #                while( my $hsp = $hit->next_hsp )    {
> #                my $querylen = $hsp->length( 'query' );
> #                my $hitlen = $hsp->length( 'hit' );
> #                my $alnlen = $hsp->length( 'total' );
> #                my $identity = $hsp->percent_identity();
> #                    if ( $identity > $maximumredundancy )    {
> #                    $redundant{ $query } = $hitname;
> #                    }
> #                    elsif ( ($querylen <= ($alnlen / 2) ) || ($hitlen <=
> ($alnlen / 2) ) || ($alnlen <= $minimumalnlength) )    {
> #                    $genefusion{ $query } = $hitname;
> #                    }
> #                    elsif ( ($evalue >= $thresholdevalue) || ($identity <=
> $minimumidentity) || ($bits <= $minimumbitsscore) )    {
> #                    $diverged{ $query } = $hitname;
> #                    }
> #                    else    {
> #                    $besthits{ $query } = $hitname;
> #                    }
> #                }
>             }
>     }
> $refbh = \%besthits;
> $refdiv = \%diverged;
> $refred = \%redundant;
> $reffus = \%genefusion;
> $cb = keys ( %besthits );
> $cd = keys( %diverged );
> $cr = keys( %redundant );
> $cf = keys( %genefusion );
>



More information about the Bioperl-l mailing list