Bioperl: Parsing BLAST output file problem

Steve Chervitz sac@neomorphic.com (Steve A. Chervitz)
Mon, 28 Dec 1998 19:42:45 -0800 (PST)


Brian,

Your problems were the result of subtle differences in the structure
of the Blast output in the older version of your BLAST program (2.0.2). 
(This is where the caveat about the BLAST text output not being a data
exchange format rears its ugly head!)

I incorporated come changes in the Bioperl Blast modules that fix this
lapse in backwards compatibility. You can get the new modules
(Sbjct.pm and HSP.pm) from:

ftp://bio.perl.org/pub/sac/blast/

The new versions of these modules will be incorporated into the next
maintenance release of the central distribution.

Steve Chervitz

P.S. In the future, could you send your mail to
bioperl-bugs@bio.perl.org? This helps the Bioperl developers track
bug reports and their fixes. You can also (preferably) use the
web-based interface accessible from http://bio.perl.org/Bugs/

Brian K. Pollock writes:
 > 
 > Hi everyone,
 > 
 > I am having difficulty parsing an existing BLAST report and was hoping
 > someone could shed some light on this.
 > 
 > Below is the sub that I am having difficulty with (a slightly modified
 > version out of Steve's blast_config.pl).
 > 
 > Section 1 : The Sub
 > *******************
 > 
 > sub CreateBlast  {
 >     $file = shift;
 >     my $blast_obj;
 > 
 >     eval {
 >         $countBlast++;
 >         $blast_obj = new Bio::Tools::Blast ( -file   => $file,
 >                                              -parse => 1,
 >                                              -signif => '1e-30'
 >                                            );
 >     };
 > 
 >     if($@) {
 >         croak "\n*** Trouble creating Blast object:\n$@\n\n";
 >     }
 > 
 >     return $blast_obj;
 > }
 > 
 > 
 > Section 2: Here is the exception that is being thrown.
 > ******************************************************
 > 
 > -------------------- EXCEPTION --------------------
 > MSG: Failed to create any Sbjct objects for 172 potential hit(s).
 > NOTE: 
 > 
 > TRAPPED EXCEPTION(S):
 > SHOWING FIRST EXCEPTION ONLY:
 > 
 > -------------------- EXCEPTION --------------------
 > MSG: Can't parse HSP data for gi|2154132|gb|AA442254|AA442254.
 > NOTE: Missing HSP data or unrecognized format.
 > CONTEXT: Error in object Bio::Tools::Blast "(339"
 > STACK: 
 > Bio::Tools::Blast::_parse_hsp_data(2656)
 > Bio::Tools::Blast::_set_hits(2487)
 > Bio::Tools::Blast::_parse(1504)
 > Bio::Tools::Blast::parse(1450)
 > Bio::Tools::SeqAnal::_initialize(291)
 > Bio::Tools::Blast::_initialize(1027)
 > Bio::Root::Object::new(352)
 > main::CreateBlast(59)
 > main::./hsv_parser.pl(42)
 > ---------------------------------------------------
 > 
 > END TRAPPED EXCEPTION(S)
 > 
 > CONTEXT: Error in object Bio::Tools::Blast "(339"
 > STACK: 
 > Bio::Tools::Blast::_set_hits(2544)
 > Bio::Tools::Blast::_parse(1504)
 > Bio::Tools::Blast::parse(1450)
 > Bio::Tools::SeqAnal::_initialize(291)
 > Bio::Tools::Blast::_initialize(1027)
 > Bio::Root::Object::new(352)
 > main::CreateBlast(59)
 > main::./hsv_parser.pl(42)
 > 
 > 
 > Section 3 : Header from a BLAST report.
 > ****************************************
 > 
 > BLASTN 2.0.2 [Sep-3-1997]
 > 
 > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
 > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
 > "Gapped
 > BLAST and PSI-BLAST: a new generation of protein database search
 > programs",
 > Nucleic Acids Res. 25:3389-3402.
 > 
 > Query= 
 >          (339 letters)
 > 
 > Database: dbEST
 >            1,294,320 sequences; 477,413,560 total letters
 > 
 > Searching..................................................done
 >                                                                     High
 > E
 > Sequences producing significant alignments:                        Score
 > Value
 > 
 > gi|2154132|gb|AA442254|AA442254 zv61g11.s1 Soares testis NHT Ho...   391
 > e-107
 > gi|564863|emb|Z40247|HSC1XA082 H. sapiens partial cDNA sequence...   391
 > e-107
 > 
 > 
 >  -------------------------- SNIP -------------------------------------
 > 
 > Section 4 : Sample HSP summary snippet:
 > ***************************************
 > 
 > > gi|2154132|gb|AA442254|AA442254 zv61g11.s1 Soares testis NHT Homo
 >           sapiens cDNA clone 758180 3'
 >           Length = 444
 >           
 > Score =  391 bits (197), Expect = e-107
 > Identities = 218/227 (96%), Positives = 218/227 (96%)
 > 
 >                                                                       
 > Query 113 aacctnggcaccgttnctttttattatagctaaacncaanatgcccaaanatatacaaaa 172
 >           ||||| ||||||||| |||||||||||||||| || ||| ||||||||| ||||||||||
 > Sbjct 4   aaccttggcaccgtttctttttattatagctagacacaagatgcccaaagatatacaaaa 63
 > 
 >                                                                       
 > Query 173 caaacaatacaaattttaaacacttttacaaaggtgacaatataaanaaatgaaaactat 232
 >           |||||||||||||||||||||||||||||||||||||||||||||| ||||||| |||||
 > Sbjct 64  caaacaatacaaattttaaacacttttacaaaggtgacaatataaagaaatgaagactat 123
 > 
 >                                                                       
 > Query 233 ggatttatgtttgcaaacattaatttcctttgaatactgtctggtgttaaaacctaacag 292
 >           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 > Sbjct 124 ggatttatgtttgcaaacattaatttcctttgaatactgtctggtgttaaaacctaacag 183
 > 
 >                                                          
 > Query 293 tttaccaaaatttgtatttgacacaactatttttaaaattatagtgg 339
 >           ||||||| |||||||||||||||||||||||||||||||||||||||
 > Sbjct 184 tttaccagaatttgtatttgacacaactatttttaaaattatagtgg 230
 > 
 > 
 > 
 > Thanks for any help,
 > Brian
 > 
 > 
 > 
 > ------------------------------------------------------
 > Brian K. Pollock
 > BioInformatics Group
 > Research Genetics Inc.
 > (800)533-4363 X2266
 > bpollock@resgen.com
 > http://www.resgen.com
 > -------------------------------------------------------
 > 
 > =========== Bioperl Project Mailing List Message Footer =======
 > Project URL: http://bio.perl.org/
 > For info about how to (un)subscribe, where messages are archived, etc:
 > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
 > ====================================================================
 > 
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================