[Bioperl-l] BLAST Parsing Bug?

Jason Stajich jason@cgt.mc.duke.edu
Tue, 20 Aug 2002 12:47:40 -0400 (EDT)


Because the parser expects to be parsing a full blast report - you are
only providing it with a report which has hits but no hsps.

At some point we can adapt the module to parse these types of reports, but
for now it is only going to work with reports that have the full
alignments included.

-jason

On Tue, 20 Aug 2002, Paul Boutros wrote:

> Hello,
>
> I am just starting with Bioperl, trying to evaluate how useful it will be
> for our group.  I'm struggling with getting it to work on my first few
> steps here, though.  I would like to use the SearchIO system to parse a
> blast-results file and I can strange results.
>
> System: Win2k Pro (sp3)
> Perl: 5.6.1 ActiveState build 631 (all packages are updated)
> BioPerl: 1.00.2
>
> The basic problem is that the parser isn't finding any of the hits.  At
> all.  So the code below comes back with $count=0 for every record in the
> BLAST output file.  Any ideas what I'm doing wrong?
>
> Paul
>
>
> Code:
> use strict;
> use Bio::SearchIO;
>
> my $searchio = new Bio::SearchIO(
> 			'-format'	=> 'blast',
> 			'-file'		=> '15k5prime.out',
> 			);
>
> while (my $result = $searchio->next_result()) {
>
> 	my $count = 0;
>
> 	print "Name: ", $result->query_name(), "\n";
>
> 	while (my $hit = $result->next_hit()) {
> 		$count++;
> 		}
>
> 	print "Count: $count\n";
>
> 	}
>
> Blast File Fragment:
>
> BLASTN 2.2.3 [Apr-24-2002]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
>
> Query= H3001A01-5
>          (589 letters)
>
> Database: est_others
>            5,032,538 sequences; 2,449,699,975 total letters
>
>
>
>                                                                  Score
> E
> Sequences producing significant alignments:                      (bits)
> Value
>
> gb|BQ206993.1|BQ206993 UI-R-DZ1-cnm-h-16-0-UI.s1 UI-R-DZ1 Rattus...   200
> 3e-050
> gb|BM386877.1|BM386877 UI-R-CN1-cjh-d-20-0-UI.s1 UI-R-CN1 Rattus...   198
> 1e-049
> gb|BI301905.1|BI301905 UI-R-DL0-cio-k-03-0-UI.s1 UI-R-DL0 Rattus...   198
> 1e-049
> gb|BI301460.1|BI301460 UI-R-DN0-cit-e-07-0-UI.s1 UI-R-DN0 Rattus...   198
> 1e-049
> gb|BG371847.1|BG371847 UI-R-CV0-brj-a-09-0-UI.s1 UI-R-CV0 Rattus...   198
> 1e-049
> gb|BE115424.1|BE115424 UI-R-BS1-axu-f-02-0-UI.s1 UI-R-BS1 Rattus...   198
> 1e-049
> gb|AA819696.1|AA819696 UI-R-A0-bh-d-10-0-UI.s1 UI-R-A0 Rattus no...   192
> 6e-048
> gb|BM383271.1|BM383271 UI-R-DS0-cje-i-16-0-UI.s1 UI-R-DS0 Rattus...   190
> 2e-047
> gb|BI292210.1|BI292210 UI-R-DN0-civ-m-09-0-UI.s1 UI-R-DN0 Rattus...   190
> 2e-047
> gb|BI284655.1|BI284655 UI-R-DE0-cac-f-05-0-UI.s1 UI-R-DE0 Rattus...   190
> 2e-047
>
>   Subset of the database(s) listed below
>      Number of letters searched: 123,827,604
>      Number of sequences searched:  285,629
>
>   Database: est_others
>     Posted date:  Aug 15, 2002 12:08 PM
>   Number of letters in database: 333,332,922
>   Number of sequences in database:  0
>
>   Database: c:\docume~1\paul\blast\data\est_others.01
>     Posted date:  Aug 15, 2002 12:21 PM
>   Number of letters in database: 333,333,126
>   Number of sequences in database:  734,123
>
>   Database: c:\docume~1\paul\blast\data\est_others.02
>     Posted date:  Aug 15, 2002 12:33 PM
>   Number of letters in database: 333,332,951
>   Number of sequences in database:  710,185
>
>   Database: c:\docume~1\paul\blast\data\est_others.03
>     Posted date:  Aug 15, 2002 12:45 PM
>   Number of letters in database: 333,332,998
>   Number of sequences in database:  651,575
>
>   Database: c:\docume~1\paul\blast\data\est_others.04
>     Posted date:  Aug 15, 2002 12:56 PM
>   Number of letters in database: 333,332,826
>   Number of sequences in database:  637,159
>
>   Database: c:\docume~1\paul\blast\data\est_others.05
>     Posted date:  Aug 15, 2002  1:07 PM
>   Number of letters in database: 333,333,104
>   Number of sequences in database:  630,795
>
>   Database: c:\docume~1\paul\blast\data\est_others.06
>     Posted date:  Aug 15, 2002  1:19 PM
>   Number of letters in database: 333,332,943
>   Number of sequences in database:  650,535
>
>   Database: c:\docume~1\paul\blast\data\est_others.07
>     Posted date:  Aug 15, 2002  1:28 PM
>   Number of letters in database: 116,369,105
>   Number of sequences in database:  227,351
>
> Lambda     K      H
>     1.37    0.711     1.31
>
> Gapped
> Lambda     K      H
>     1.37    0.711     1.31
>
>
> Matrix: blastn matrix:1 -3
> Gap Penalties: Existence: 5, Extension: 2
> Number of Hits to DB: 69,708
> Number of Sequences: 4241723
> Number of extensions: 69708
> Number of successful extensions: 25280
> Number of sequences better than  0.3: 2163
> length of query: 589
> length of database: 123,827,604
> effective HSP length: 18
> effective length of query: 571
> effective length of database: 118,686,282
> effective search space: 67769867022
> effective search space used: 67769867022
> T: 0
> A: 40
> X1: 6 (11.9 bits)
> X2: 15 (29.7 bits)
> S1: 12 (24.3 bits)
> S2: 19 (38.2 bits)
> BLASTN 2.2.3 [Apr-24-2002]
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu