[Bioperl-l] BLAST Parsing Bug?

Andreas Matern andreas.matern@lbri.lionbioscience.com
Tue, 20 Aug 2002 12:39:26 -0400


This code works fine for me using Win2k
I didn't do the activestate perl build, however, perhaps that is bugged.

I just downloaded the bioperl 1.0.2 tarball and uncompressed in into 
C:\Perl\site\lib

I normally don't use Win2k however, I generaly use the CVS version of 
bioperl which tends to work fine...

And bioperl is EXTREMELY useful for me, great job bioperlers....

-Andreas

Paul Boutros wrote:
> Hello,
> 
> I am just starting with Bioperl, trying to evaluate how useful it will be
> for our group.  I'm struggling with getting it to work on my first few
> steps here, though.  I would like to use the SearchIO system to parse a
> blast-results file and I can strange results.
> 
> System: Win2k Pro (sp3)
> Perl: 5.6.1 ActiveState build 631 (all packages are updated)
> BioPerl: 1.00.2
> 
> The basic problem is that the parser isn't finding any of the hits.  At
> all.  So the code below comes back with $count=0 for every record in the
> BLAST output file.  Any ideas what I'm doing wrong?
> 
> Paul
> 
> 
> Code:
> use strict;
> use Bio::SearchIO;
> 
> my $searchio = new Bio::SearchIO(
> 			'-format'	=> 'blast',
> 			'-file'		=> '15k5prime.out',
> 			);
> 
> while (my $result = $searchio->next_result()) {
> 
> 	my $count = 0;
> 
> 	print "Name: ", $result->query_name(), "\n";
> 
> 	while (my $hit = $result->next_hit()) {
> 		$count++;
> 		}
> 
> 	print "Count: $count\n";
> 
> 	}
> 
> Blast File Fragment:
> 
> BLASTN 2.2.3 [Apr-24-2002]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= H3001A01-5
>          (589 letters)
> 
> Database: est_others 
>            5,032,538 sequences; 2,449,699,975 total letters
> 
> 
> 
>                                                                  Score
> E
> Sequences producing significant alignments:                      (bits)
> Value
> 
> gb|BQ206993.1|BQ206993 UI-R-DZ1-cnm-h-16-0-UI.s1 UI-R-DZ1 Rattus...   200
> 3e-050
> gb|BM386877.1|BM386877 UI-R-CN1-cjh-d-20-0-UI.s1 UI-R-CN1 Rattus...   198
> 1e-049
> gb|BI301905.1|BI301905 UI-R-DL0-cio-k-03-0-UI.s1 UI-R-DL0 Rattus...   198
> 1e-049
> gb|BI301460.1|BI301460 UI-R-DN0-cit-e-07-0-UI.s1 UI-R-DN0 Rattus...   198
> 1e-049
> gb|BG371847.1|BG371847 UI-R-CV0-brj-a-09-0-UI.s1 UI-R-CV0 Rattus...   198
> 1e-049
> gb|BE115424.1|BE115424 UI-R-BS1-axu-f-02-0-UI.s1 UI-R-BS1 Rattus...   198
> 1e-049
> gb|AA819696.1|AA819696 UI-R-A0-bh-d-10-0-UI.s1 UI-R-A0 Rattus no...   192
> 6e-048
> gb|BM383271.1|BM383271 UI-R-DS0-cje-i-16-0-UI.s1 UI-R-DS0 Rattus...   190
> 2e-047
> gb|BI292210.1|BI292210 UI-R-DN0-civ-m-09-0-UI.s1 UI-R-DN0 Rattus...   190
> 2e-047
> gb|BI284655.1|BI284655 UI-R-DE0-cac-f-05-0-UI.s1 UI-R-DE0 Rattus...   190
> 2e-047
> 
>   Subset of the database(s) listed below
>      Number of letters searched: 123,827,604
>      Number of sequences searched:  285,629
>   
>   Database: est_others
>     Posted date:  Aug 15, 2002 12:08 PM
>   Number of letters in database: 333,332,922
>   Number of sequences in database:  0
>   
>   Database: c:\docume~1\paul\blast\data\est_others.01
>     Posted date:  Aug 15, 2002 12:21 PM
>   Number of letters in database: 333,333,126
>   Number of sequences in database:  734,123
>   
>   Database: c:\docume~1\paul\blast\data\est_others.02
>     Posted date:  Aug 15, 2002 12:33 PM
>   Number of letters in database: 333,332,951
>   Number of sequences in database:  710,185
>   
>   Database: c:\docume~1\paul\blast\data\est_others.03
>     Posted date:  Aug 15, 2002 12:45 PM
>   Number of letters in database: 333,332,998
>   Number of sequences in database:  651,575
>   
>   Database: c:\docume~1\paul\blast\data\est_others.04
>     Posted date:  Aug 15, 2002 12:56 PM
>   Number of letters in database: 333,332,826
>   Number of sequences in database:  637,159
>   
>   Database: c:\docume~1\paul\blast\data\est_others.05
>     Posted date:  Aug 15, 2002  1:07 PM
>   Number of letters in database: 333,333,104
>   Number of sequences in database:  630,795
>   
>   Database: c:\docume~1\paul\blast\data\est_others.06
>     Posted date:  Aug 15, 2002  1:19 PM
>   Number of letters in database: 333,332,943
>   Number of sequences in database:  650,535
>   
>   Database: c:\docume~1\paul\blast\data\est_others.07
>     Posted date:  Aug 15, 2002  1:28 PM
>   Number of letters in database: 116,369,105
>   Number of sequences in database:  227,351
>   
> Lambda     K      H
>     1.37    0.711     1.31 
> 
> Gapped
> Lambda     K      H
>     1.37    0.711     1.31 
> 
> 
> Matrix: blastn matrix:1 -3
> Gap Penalties: Existence: 5, Extension: 2
> Number of Hits to DB: 69,708
> Number of Sequences: 4241723
> Number of extensions: 69708
> Number of successful extensions: 25280
> Number of sequences better than  0.3: 2163
> length of query: 589
> length of database: 123,827,604
> effective HSP length: 18
> effective length of query: 571
> effective length of database: 118,686,282
> effective search space: 67769867022
> effective search space used: 67769867022
> T: 0
> A: 40
> X1: 6 (11.9 bits)
> X2: 15 (29.7 bits)
> S1: 12 (24.3 bits)
> S2: 19 (38.2 bits)
> BLASTN 2.2.3 [Apr-24-2002]
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l


-- 
--------------
Andreas Matern
Bioinformatician
Bioinformatics - Research and Development
Lion Bioscience Research Inc.
141 Portland Street, 10th floor
Cambridge, MA 02139  USA
Phone: 617-245-5483
Fax: 617-245-5499
amatern@lbri.lionbioscience.com
www.lionbioscience.com