[Bioperl-l] Problems parsing blast reports

Dave Messina David.Messina at sbc.su.se
Sat May 14 16:15:08 UTC 2011


Hi Lorenzo,

Hmm, I tried your code, and it worked for me.

I set your input variables set as follows:
my ( $filename, $format, $minimumbitsscore, $thresholdevalue,
$minimumidentity,
    $maximumredundancy, $minimumalnlength )
  = ('2008.blasttable', 'blasttable', 300, '1e-100', 10, 0, 10);

where the file is t/data/2008.blasttable that comes with the BioPerl distro.
With those settings, the top hit goes into %besthits and the third hit goes
into %diverged.

I also added at the top:
use strict;
use warnings;
use Bio::SearchIO;

You *are* using strict and warnings, aren't you? :)

One thing that may be an issue is that you're doing this at the hit level,
and remember that in -m8 format each line represents an HSP.

I'd need your input — really, a neat little test case and the corresponding
erroneous output — to help further.

Dave



On Sat, May 14, 2011 at 17:29, Lorenzo Carretero <lcpaulet at googlemail.com>wrote:

> Hi all,
> I'm trying to parse blasttable '-m 8 format' reports from whole
> intra-genome
> comparisons of all vs all to get the best non-self hit and diverged best
> non-self hit as separate hashes (key=query=>value=non_self_hit). The
> following script seems to run ok but it returns unexpected results (i.e. it
> doesn't catch the best hit but a random hit apparently). I assume hits are
> iterated over the while loop (while (my $hit = $result->next_hit) as
> returned in the blast results (i.e. sorted by hits bit scores).
> Any help would be much appreciated.
> Cheers,
> Lorenzo
>
>
> my ( $filename, $format, $minimumbitsscore, $thresholdevalue,
> > $minimumidentity, $maximumredundancy, $minimumalnlength ) = @_;
> > my ( %besthits, %diverged, %redundant ,%genefusion ) = ();
> > my ( $refbh, $refdiv, $cb, $cd, $refred, $reffus, $cr, $cf);
> > my $total = 0;
> > my $in = new Bio::SearchIO (     -file   => $filename,
> >                                  -format => $format,
> >                                  #-verbose => -1,
> >                                   )
> >                                  or die "No $filename BLAST file with
> > $format found";
> >       while( my $result = $in->next_result)    {
> >       $total++;
> >           while (my $hit = $result->next_hit)    {
> >           my $query = $result->query_name();
> >           my $hitname = $hit->name();
> >           my $bits = $hit->bits();
> >           my $evalue = $hit->significance();
> >             if ($query ne $hitname and $bits >= $minimumbitsscore and
> > $evalue <= $thresholdevalue )    {
> >                 $besthits{ $query } = $hitname;
> >             }
> >             elsif    ( $bits <= $minimumbitsscore and $evalue >=
> > $thresholdevalue){
> >                 $diverged { $query } = $hitname;
> >             }
> >
> > #                while( my $hsp = $hit->next_hsp )    {
> > #                my $querylen = $hsp->length( 'query' );
> > #                my $hitlen = $hsp->length( 'hit' );
> > #                my $alnlen = $hsp->length( 'total' );
> > #                my $identity = $hsp->percent_identity();
> > #                    if ( $identity > $maximumredundancy )    {
> > #                    $redundant{ $query } = $hitname;
> > #                    }
> > #                    elsif ( ($querylen <= ($alnlen / 2) ) || ($hitlen <=
> > ($alnlen / 2) ) || ($alnlen <= $minimumalnlength) )    {
> > #                    $genefusion{ $query } = $hitname;
> > #                    }
> > #                    elsif ( ($evalue >= $thresholdevalue) || ($identity
> <=
> > $minimumidentity) || ($bits <= $minimumbitsscore) )    {
> > #                    $diverged{ $query } = $hitname;
> > #                    }
> > #                    else    {
> > #                    $besthits{ $query } = $hitname;
> > #                    }
> > #                }
> >             }
> >     }
> > $refbh = \%besthits;
> > $refdiv = \%diverged;
> > $refred = \%redundant;
> > $reffus = \%genefusion;
> > $cb = keys ( %besthits );
> > $cd = keys( %diverged );
> > $cr = keys( %redundant );
> > $cf = keys( %genefusion );
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>




More information about the Bioperl-l mailing list