[Bioperl-l] Blast parsing exception

Marc Logghe MarcL@DEVGEN.com
Fri, 19 Oct 2001 08:54:48 +0200


Hi all,
When parsing a multiple blast output file I get these error message as soon
as it reaches the 'No hits' - blast report:
-------------------- WARNING ---------------------
MSG: Can't determine query sequence name from BLAST report.
---------------------------------------------------
-------------------- EXCEPTION --------------------
MSG: Unexpected error during read: -------------------- EXCEPTION
--------------------
MSG: Can't determine sequence length from BLAST report.
STACK Bio::Tools::Blast::_set_length
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:2550
STACK Bio::Tools::Blast::_parse_header
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1962
STACK Bio::Tools::Blast::__ANON__
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1761
STACK (eval) /usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:752
STACK Bio::Root::IOManager::read
/usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:736
STACK Bio::Root::Object::read
/usr/lib/perl5/site_perl/5.005/Bio/Root/Object.pm:1511
STACK Bio::Tools::Blast::_parse_blast_stream
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1615
STACK Bio::Tools::Blast::parse
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1465
STACK toplevel ./get_alias.pl:9
-------------------------------------------

STACK Bio::Root::IOManager::read
/usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:763
STACK Bio::Root::Object::read
/usr/lib/perl5/site_perl/5.005/Bio/Root/Object.pm:1511
STACK Bio::Tools::Blast::_parse_blast_stream
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1615
STACK Bio::Tools::Blast::parse
/usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1465
STACK toplevel ./get_alias.pl:9
-------------------------------------------
When you remove this 'no hits' report, it works fine.
Can somebody help me out with this, I am not able to pinpoint the problem.
Thanks.
I took the relevant part of the blast results out and passed it to STDIN to
be able to reproduce the exception easily.

#!/usr/local/bin/perl -w

use strict;
use Bio::Tools::Blast qw(:obj);

*STDIN = *DATA; 

$Blast->parse
(  
#  -file   => '../wp18_wp63.res',
#  -file   => '../test.res',
  -parse  => 1,
  -exec_func => \&process_blast,
);
                                     
sub process_blast
{
  my $blastObj = shift;
  my $hit = $blastObj->hit;
  my $qname = $blastObj->query;
  if ($hit)
  {
    my $hitname = $hit->name;
    printf("%s\t%s\t%s\n", $qname,$hitname,$hit->expect) if ($qname ne
$hitname);
  }
  else
  {
    print STDERR "$qname\n";
  }
  $blastObj->destroy;
}

__DATA__
BLASTP 2.2.1 [Jul-12-2001]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= ZK994.5 CE15491   (ST.LOUIS) TR:O44085 protein_id:AAB88613.1
         (339 letters)

Database: wp63
           20,100 sequences; 8,819,854 total letters

Searching.........................................done

                                                                   Score
E
Sequences producing significant alignments:                        (bits)
Value

T24H5.1 CE26008    (ST.LOUIS) protein_id:AAK84578.1                   620
e-178
T23B12.9 CE14042    (ST.LOUIS) TR:O17008 protein_id:AAB69941.1        616
e-177
F07C7.1 CE07032    (ST.LOUIS) TR:Q19161 protein_id:AAA85753.1         463
e-131

>T24H5.1 CE26008    (ST.LOUIS) protein_id:AAK84578.1
          Length = 414

 Score =  620 bits (1598), Expect = e-178
 Identities = 300/339 (88%), Positives = 300/339 (88%)

Query: 1   MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
           MINNRPLVAHARSPNDMITLRPMDFMIPGVMIE                    HLEKFES
Sbjct: 76  MINNRPLVAHARSPNDMITLRPMDFMIPGVMIETPRTPADSPTTSTTETRTRAHLEKFES 135

Query: 61  ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
           ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ
Sbjct: 136 ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 195

Query: 121 SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
           SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTG                   
Sbjct: 196 SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGPDNDTPANDTNNDTDKDTA 255

Query: 181 GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
           GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
Sbjct: 256 GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 315

Query: 241 IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
           IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
Sbjct: 316 IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 375

Query: 301 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
           APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF
Sbjct: 376 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 414


>T23B12.9 CE14042    (ST.LOUIS) TR:O17008 protein_id:AAB69941.1
          Length = 1744

 Score =  616 bits (1589), Expect = e-177
 Identities = 299/339 (88%), Positives = 299/339 (88%)

Query: 1    MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
            MINNRPLVAHARSPNDMITLRPMDFMIPGVMIE                    HLEKFES
Sbjct: 1406 MINNRPLVAHARSPNDMITLRPMDFMIPGVMIETPRTPADSPTTSTTETRTRAHLEKFES
1465

Query: 61   ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
            ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPN VSRHRWPLALVVQVNQ
Sbjct: 1466 ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNNVSRHRWPLALVVQVNQ
1525

Query: 121  SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
            SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTG                   
Sbjct: 1526 SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGPDNDTPANDTNNDTDKDTA
1585

Query: 181  GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
            GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
Sbjct: 1586 GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
1645

Query: 241  IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
            IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
Sbjct: 1646 IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
1705

Query: 301  APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
            APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF
Sbjct: 1706 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 1744


>F07C7.1 CE07032    (ST.LOUIS) TR:Q19161 protein_id:AAA85753.1
          Length = 1879

 Score =  463 bits (1192), Expect = e-131
 Identities = 239/339 (70%), Positives = 248/339 (72%), Gaps = 31/339 (9%)

Query: 1    MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
            MINNRPLVAHARSPNDMI LRPMDFMIPGVMIE                    HLEKFES
Sbjct: 1572 MINNRPLVAHARSPNDMIALRPMDFMIPGVMIETPRTPADSPTTSTTEIRTRAHLEKFES
1631

Query: 61   ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
            ALERLWTIWTFGVMLILREVSHKHKRCCD KPEVGDVVIIN NYVSRHRWPLALVVQVNQ
Sbjct: 1632 ALERLWTIWTFGVMLILREVSHKHKRCCDPKPEVGDVVIINTNYVSRHRWPLALVVQVNQ
1691

Query: 121  SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
            SKRDGEIRTAV              LIPLETSRQ+IRHGTG                   
Sbjct: 1692 SKRDGEIRTAV--------------LIPLETSRQDIRHGTGPDNDTPANDTNNDTDKDTA
1737

Query: 181  GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
            GSDQCR CPTLPTPALLDFENSH A+ + P                 ++        EI 
Sbjct: 1738 GSDQCRPCPTLPTPALLDFENSHFARRSQP-----------------KFSRTSVKNLEIS
1780

Query: 241  IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
            ++ GPDYDTNNPLF EDGE EDRPVEYVDP TAIPEIAYD+AETRLP GRTREYLGRKAK
Sbjct: 1781 LWIGPDYDTNNPLFHEDGEAEDRPVEYVDPITAIPEIAYDNAETRLPQGRTREYLGRKAK
1840

Query: 301  APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
            APYINYNHAEITRVLS+PSPPECCRFPVIPQESLNLKDF
Sbjct: 1841 APYINYNHAEITRVLSDPSPPECCRFPVIPQESLNLKDF 1879


  Database: wp63
    Posted date:  Sep 14, 2001 11:28 AM
  Number of letters in database: 8,819,854
  Number of sequences in database:  20,100
  
Lambda     K      H
   0.320    0.139    0.435 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 5,848,008
Number of Sequences: 20100
Number of extensions: 236898
Number of successful extensions: 507
Number of sequences better than 1.0e-100: 3
Number of HSP's better than  0.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test: 498
Number of HSP's gapped (non-prelim): 3
length of query: 339
length of database: 8,819,854
effective HSP length: 98
effective length of query: 241
effective length of database: 6,850,054
effective search space: 1650863014
effective search space used: 1650863014
T: 11
A: 40
X1: 16 ( 7.4 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.8 bits)
S2: 930 (362.8 bits)
BLASTP 2.2.1 [Jul-12-2001]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= ZK994.6 CE15493   (ST.LOUIS) TR:O44088 protein_id:AAB88612.1
         (113 letters)

Database: wp63
           20,100 sequences; 8,819,854 total letters

Searching.........................................done

 ***** No hits found ******

  Database: wp63
    Posted date:  Sep 14, 2001 11:28 AM
  Number of letters in database: 8,819,854
  Number of sequences in database:  20,100
  
Lambda     K      H
   0.316    0.129    0.372 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 1,625,059
Number of Sequences: 20100
Number of extensions: 53280
Number of successful extensions: 140
Number of sequences better than 1.0e-100: 0
Number of HSP's better than  0.0 without gapping: 0
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 140
Number of HSP's gapped (non-prelim): 0
length of query: 113
length of database: 8,819,854
effective HSP length: 89
effective length of query: 24
effective length of database: 7,030,954
effective search space: 168742896
effective search space used: 168742896
T: 11
A: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.6 bits)
S2: 922 (359.8 bits)