[Bioperl-l] Bio::SearchIO parsing of WuBLASTX reports

Mike Croning mdr@sanger.ac.uk
Thu, 18 Apr 2002 16:45:45 +0100 (BST)


Hi Guys

I am trying to use SearchIO to parse WuBLASTX results, and wonder if I am
doing something obviously wrong. The problem is that the $hit->next_hsp
method often returns 0 hsps for a hit, when they are clearly there in the
output. There seems no obvious (to me) relationship to score, or other
properties of the match.

Here is the code:

sub parse_WuBLASTX_results2 {
    my ($fh, $hit_hash_ref) = @_;

    my $searchio;
    if (defined($fh)) {
        $searchio = new Bio::SearchIO('-format' => 'blast', -fh => $fh);
        print "searchio: ", ref($searchio), "\n";
    } else {
        return;
    }

    my $blast = $searchio->next_result;
    print "blast: ", ref($blast), "\n";
    my $query_length = $blast->query_length;
    print "Query name  : ", $blast->query_name, "\n";
    print "Query length: ", $blast->query_length, "\n";
    print $blast->query_description, "\n";
    print "\n\n";

    while (my $hit = $blast->next_hit) {
        my $hsp_counter = 0;
        my $match_pos_string = '0' x $query_length;
        my @fields = split(/\s+/, $hit->description);
        my $Hit_accession = $fields[0];
        $Hit_accession =~ s/\.\d+//;

        my $total_score = 0;
        my @hsps;

        print $Hit_accession, " ";

        while (my $hsp = $hit->next_hsp) {
            $hsp_counter++;
            print "Strand: ";
            print $hsp->strand;
            push(@hsps, $hsp);
            $total_score += $hsp->score;

        }
        print "  HSPs: $hsp_counter\n";

        unless (length($match_pos_string) == $query_length) {
            warn "Match pos string length has changed\n";
        }
        #$$hit_hash{'accession'} = [a Bio::Search::Hit::GenericHit, total
score, match_pos_string, \@hsps];
        $$hit_hash_ref{$Hit_accession} = [$hit, $total_score,
$match_pos_string, \@hsps];
    }
    return 1;
}

And the output:

searchio: Bio::SearchIO::blast
blast: Bio::Search::Result::GenericResult
Query name  : TNeu_22_29_1
Query length: 673



P19945   HSPs: 0
P14869   HSPs: 0
P05388   HSPs: 0
Q9BVK4   HSPs: 0
P47826   HSPs: 0
Q9PV90   HSPs: 0
Q95140   HSPs: 0
Q96FQ9   HSPs: 0
Q8WQJ2   HSPs: 0
Q9U3U0   HSPs: 0
Q93572   HSPs: 0
P19889 Strand: 1Strand: 1  HSPs: 2
Q96TJ5   HSPs: 0
Q9NHP0 Strand: 1Strand: 1  HSPs: 2
P05317   HSPs: 0
Q9C3Z6   HSPs: 0

<snip>

The file parses fine with BPlite.

Thanks

Mike Croning
'Grunt Programmer'
Vertebrate Sequence Analysis
The Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244
Fax: +44 (0)1223 494919