[Bioperl-l] Blast parsing exception

Marc Logghe MarcL@DEVGEN.com
Fri, 19 Oct 2001 16:27:28 +0200


Found the bug. When I commented out completely this code snippet from
Bio::Tools::Blast

              # Incyte_Fix:   Nasty Invisible Bug.
              # Records in blast report are delimited by '>', but... when
              #  there are no hits for a query, there won't be a '>'.  That
              #  causes several blast reports to run together in the data
              #  passed to this routine.  Need to get rid of non-hits in
data
              if ($data =~ /.+(No hits? found.+)/so) {
                  $data = $1;
              }
              # End Incyte_Fix

then exception was gone. Of course the original problem for which the
Incyte_fix was intended is unfixed again. Working on that.
Marc

> -----Original Message-----
> From: Marc Logghe [mailto:MarcL@devgen.com]
> Sent: Friday, October 19, 2001 8:55 AM
> To: 'bioperl-l@bioperl.org'
> Subject: [Bioperl-l] Blast parsing exception
> 
> 
> Hi all,
> When parsing a multiple blast output file I get these error 
> message as soon
> as it reaches the 'No hits' - blast report:
> -------------------- WARNING ---------------------
> MSG: Can't determine query sequence name from BLAST report.
> ---------------------------------------------------
> -------------------- EXCEPTION --------------------
> MSG: Unexpected error during read: -------------------- EXCEPTION
> --------------------
> MSG: Can't determine sequence length from BLAST report.
> STACK Bio::Tools::Blast::_set_length
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:2550
> STACK Bio::Tools::Blast::_parse_header
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1962
> STACK Bio::Tools::Blast::__ANON__
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1761
> STACK (eval) /usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:752
> STACK Bio::Root::IOManager::read
> /usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:736
> STACK Bio::Root::Object::read
> /usr/lib/perl5/site_perl/5.005/Bio/Root/Object.pm:1511
> STACK Bio::Tools::Blast::_parse_blast_stream
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1615
> STACK Bio::Tools::Blast::parse
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1465
> STACK toplevel ./get_alias.pl:9
> -------------------------------------------
> 
> STACK Bio::Root::IOManager::read
> /usr/lib/perl5/site_perl/5.005/Bio/Root/IOManager.pm:763
> STACK Bio::Root::Object::read
> /usr/lib/perl5/site_perl/5.005/Bio/Root/Object.pm:1511
> STACK Bio::Tools::Blast::_parse_blast_stream
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1615
> STACK Bio::Tools::Blast::parse
> /usr/lib/perl5/site_perl/5.005/Bio/Tools/Blast.pm:1465
> STACK toplevel ./get_alias.pl:9
> -------------------------------------------
> When you remove this 'no hits' report, it works fine.
> Can somebody help me out with this, I am not able to pinpoint 
> the problem.
> Thanks.
> I took the relevant part of the blast results out and passed 
> it to STDIN to
> be able to reproduce the exception easily.
> 
> #!/usr/local/bin/perl -w
> 
> use strict;
> use Bio::Tools::Blast qw(:obj);
> 
> *STDIN = *DATA; 
> 
> $Blast->parse
> (  
> #  -file   => '../wp18_wp63.res',
> #  -file   => '../test.res',
>   -parse  => 1,
>   -exec_func => \&process_blast,
> );
>                                      
> sub process_blast
> {
>   my $blastObj = shift;
>   my $hit = $blastObj->hit;
>   my $qname = $blastObj->query;
>   if ($hit)
>   {
>     my $hitname = $hit->name;
>     printf("%s\t%s\t%s\n", $qname,$hitname,$hit->expect) if ($qname ne
> $hitname);
>   }
>   else
>   {
>     print STDERR "$qname\n";
>   }
>   $blastObj->destroy;
> }
> 
> __DATA__
> BLASTP 2.2.1 [Jul-12-2001]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro 
> A. Schaffer, 
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
> "Gapped BLAST and PSI-BLAST: a new generation of protein 
> database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= ZK994.5 CE15491   (ST.LOUIS) TR:O44085 protein_id:AAB88613.1
>          (339 letters)
> 
> Database: wp63
>            20,100 sequences; 8,819,854 total letters
> 
> Searching.........................................done
> 
>                                                               
>      Score
> E
> Sequences producing significant alignments:                   
>      (bits)
> Value
> 
> T24H5.1 CE26008    (ST.LOUIS) protein_id:AAK84578.1           
>         620
> e-178
> T23B12.9 CE14042    (ST.LOUIS) TR:O17008 
> protein_id:AAB69941.1        616
> e-177
> F07C7.1 CE07032    (ST.LOUIS) TR:Q19161 protein_id:AAA85753.1 
>         463
> e-131
> 
> >T24H5.1 CE26008    (ST.LOUIS) protein_id:AAK84578.1
>           Length = 414
> 
>  Score =  620 bits (1598), Expect = e-178
>  Identities = 300/339 (88%), Positives = 300/339 (88%)
> 
> Query: 1   
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
>            MINNRPLVAHARSPNDMITLRPMDFMIPGVMIE                  
>   HLEKFES
> Sbjct: 76  
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIETPRTPADSPTTSTTETRTRAHLEKFES 135
> 
> Query: 61  
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
>            
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ
> Sbjct: 136 
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 195
> 
> Query: 121 
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
>            SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTG          
>          
> Sbjct: 196 
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGPDNDTPANDTNNDTDKDTA 255
> 
> Query: 181 
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
>            
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
> Sbjct: 256 
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 315
> 
> Query: 241 
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
>            
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
> Sbjct: 316 
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 375
> 
> Query: 301 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
>            APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF
> Sbjct: 376 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 414
> 
> 
> >T23B12.9 CE14042    (ST.LOUIS) TR:O17008 protein_id:AAB69941.1
>           Length = 1744
> 
>  Score =  616 bits (1589), Expect = e-177
>  Identities = 299/339 (88%), Positives = 299/339 (88%)
> 
> Query: 1    
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
>             MINNRPLVAHARSPNDMITLRPMDFMIPGVMIE                 
>    HLEKFES
> Sbjct: 1406 
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIETPRTPADSPTTSTTETRTRAHLEKFES
> 1465
> 
> Query: 61   
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
>             ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPN 
> VSRHRWPLALVVQVNQ
> Sbjct: 1466 
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNNVSRHRWPLALVVQVNQ
> 1525
> 
> Query: 121  
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
>             SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTG         
>           
> Sbjct: 1526 
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGPDNDTPANDTNNDTDKDTA
> 1585
> 
> Query: 181  
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
>             
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
> Sbjct: 1586 
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID
> 1645
> 
> Query: 241  
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
>             
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
> Sbjct: 1646 
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK
> 1705
> 
> Query: 301  APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
>             APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF
> Sbjct: 1706 APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 1744
> 
> 
> >F07C7.1 CE07032    (ST.LOUIS) TR:Q19161 protein_id:AAA85753.1
>           Length = 1879
> 
>  Score =  463 bits (1192), Expect = e-131
>  Identities = 239/339 (70%), Positives = 248/339 (72%), Gaps 
> = 31/339 (9%)
> 
> Query: 1    
> MINNRPLVAHARSPNDMITLRPMDFMIPGVMIEXXXXXXXXXXXXXXXXXXXXHLEKFES 60
>             MINNRPLVAHARSPNDMI LRPMDFMIPGVMIE                 
>    HLEKFES
> Sbjct: 1572 
> MINNRPLVAHARSPNDMIALRPMDFMIPGVMIETPRTPADSPTTSTTEIRTRAHLEKFES
> 1631
> 
> Query: 61   
> ALERLWTIWTFGVMLILREVSHKHKRCCDLKPEVGDVVIINPNYVSRHRWPLALVVQVNQ 120
>             ALERLWTIWTFGVMLILREVSHKHKRCCD KPEVGDVVIIN 
> NYVSRHRWPLALVVQVNQ
> Sbjct: 1632 
> ALERLWTIWTFGVMLILREVSHKHKRCCDPKPEVGDVVIINTNYVSRHRWPLALVVQVNQ
> 1691
> 
> Query: 121  
> SKRDGEIRTAVVRCKGKLYKRSVCQLIPLETSRQNIRHGTGXXXXXXXXXXXXXXXXXXX 180
>             SKRDGEIRTAV              LIPLETSRQ+IRHGTG         
>           
> Sbjct: 1692 
> SKRDGEIRTAV--------------LIPLETSRQDIRHGTGPDNDTPANDTNNDTDKDTA
> 1737
> 
> Query: 181  
> GSDQCRSCPTLPTPALLDFENSHLAQEAFPAQILPNIGEEPRDITLDRWKSHDAIGPEID 240
>             GSDQCR CPTLPTPALLDFENSH A+ + P                 ++ 
>        EI 
> Sbjct: 1738 
> GSDQCRPCPTLPTPALLDFENSHFARRSQP-----------------KFSRTSVKNLEIS
> 1780
> 
> Query: 241  
> IFEGPDYDTNNPLFPEDGEDEDRPVEYVDPNTAIPEIAYDHAETRLPHGRTREYLGRKAK 300
>             ++ GPDYDTNNPLF EDGE EDRPVEYVDP TAIPEIAYD+AETRLP 
> GRTREYLGRKAK
> Sbjct: 1781 
> LWIGPDYDTNNPLFHEDGEAEDRPVEYVDPITAIPEIAYDNAETRLPQGRTREYLGRKAK
> 1840
> 
> Query: 301  APYINYNHAEITRVLSNPSPPECCRFPVIPQESLNLKDF 339
>             APYINYNHAEITRVLS+PSPPECCRFPVIPQESLNLKDF
> Sbjct: 1841 APYINYNHAEITRVLSDPSPPECCRFPVIPQESLNLKDF 1879
> 
> 
>   Database: wp63
>     Posted date:  Sep 14, 2001 11:28 AM
>   Number of letters in database: 8,819,854
>   Number of sequences in database:  20,100
>   
> Lambda     K      H
>    0.320    0.139    0.435 
> 
> Gapped
> Lambda     K      H
>    0.267   0.0410    0.140 
> 
> 
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Hits to DB: 5,848,008
> Number of Sequences: 20100
> Number of extensions: 236898
> Number of successful extensions: 507
> Number of sequences better than 1.0e-100: 3
> Number of HSP's better than  0.0 without gapping: 2
> Number of HSP's successfully gapped in prelim test: 1
> Number of HSP's that attempted gapping in prelim test: 498
> Number of HSP's gapped (non-prelim): 3
> length of query: 339
> length of database: 8,819,854
> effective HSP length: 98
> effective length of query: 241
> effective length of database: 6,850,054
> effective search space: 1650863014
> effective search space used: 1650863014
> T: 11
> A: 40
> X1: 16 ( 7.4 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 41 (21.8 bits)
> S2: 930 (362.8 bits)
> BLASTP 2.2.1 [Jul-12-2001]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro 
> A. Schaffer, 
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
> "Gapped BLAST and PSI-BLAST: a new generation of protein 
> database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= ZK994.6 CE15493   (ST.LOUIS) TR:O44088 protein_id:AAB88612.1
>          (113 letters)
> 
> Database: wp63
>            20,100 sequences; 8,819,854 total letters
> 
> Searching.........................................done
> 
>  ***** No hits found ******
> 
>   Database: wp63
>     Posted date:  Sep 14, 2001 11:28 AM
>   Number of letters in database: 8,819,854
>   Number of sequences in database:  20,100
>   
> Lambda     K      H
>    0.316    0.129    0.372 
> 
> Gapped
> Lambda     K      H
>    0.267   0.0410    0.140 
> 
> 
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Hits to DB: 1,625,059
> Number of Sequences: 20100
> Number of extensions: 53280
> Number of successful extensions: 140
> Number of sequences better than 1.0e-100: 0
> Number of HSP's better than  0.0 without gapping: 0
> Number of HSP's successfully gapped in prelim test: 0
> Number of HSP's that attempted gapping in prelim test: 140
> Number of HSP's gapped (non-prelim): 0
> length of query: 113
> length of database: 8,819,854
> effective HSP length: 89
> effective length of query: 24
> effective length of database: 7,030,954
> effective search space: 168742896
> effective search space used: 168742896
> T: 11
> A: 40
> X1: 16 ( 7.3 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 41 (21.6 bits)
> S2: 922 (359.8 bits)
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>