[Bioperl-l] BLAST Parsing

Paul Boutros pcboutro@engmail.uwaterloo.ca
Fri, 20 Sep 2002 14:16:11 -0400 (EDT)


Hi all,

Another potential bug in BLAST parsing (SearchIO\blast.pm).

My setup:
BioPerl 1.02
Perl 5.6.1 (ActiveState)
WinXP SP1

The parser doesn't seem to be recognizing one of the lines in my blast
output file.  The error is:

------------- EXCEPTION  -------------
MSG: no data for midline Lambda     K      H
STACK Bio::SearchIO::blast::next_result
C:/Perl/site/lib/Bio/SearchIO/blast.pm:5
67
STACK toplevel parseb~1.pl:55

--------------------------------------

The offending part of the blast output file looks like this:
=========================
Sbjct: 564 cctggg 569



Lambda     K      H
    1.37    0.711     1.31

Gapped
Lambda     K      H
    1.37    0.711     1.31


Matrix: blastn matrix:1 -3
==========================

BLAST parameters were:
-p blastn 
-d est_others
-e 0.001
-v 10
-b 10
-l Rn_GI

Minimal code is:
use Bio::SearchIO;
my $infile = $ARGV[0];

my $searchio = new Bio::SearchIO(
			'-format'	=> 'blast',
			'-file'		=> $infile,
			);

while (my $result = $searchio->next_result()) { }

The offending part of the blast.pm file looks like this:
if( /^((Query|Sbjct):\s+(\d+)\s*)(\S+)\s+(\d+)/ ) {
     $data{$2} = $4;
     $len = length($1);
     $self->{"\_$2"}->{'begin'} = $3 unless $self->{"_$2"}->{'be
     $self->{"\_$2"}->{'end'} = $5;
} else {
     $self->throw("no data for midline $_")
       unless (defined $_ && defined $len);
     $data{'Mid'} = substr($_,$len);
}

removing the $self->throw and replacing the unless with:
if (defined $_ && defined $len) {
  $data{'Mid'} = substr($_,$len);
  }

seems to be parsing correctly, but at the cost of an awful lot warnings.

I can preparse out the 
Lambda	    K       H
lines, but I'm not sure which one should be removed, or if I will also
need to remove the blank lines.

Any ideas/comments/criticism welcome.
Paul