[Bioperl-l] Parsing BLASTP or TBLASTN reveals subtle query_length = 0 bug

Matthew Vaughn vaughn at cshl.org
Fri Jun 6 12:55:44 EDT 2003


I've got some large BLASTP and TBLASTN reports to extract data from and 
I've run into some issues that I think are coming from the 
Bio:SearchIO:psiblast parser

Essentially, instead of $result->query_length returning the length of 
the query sequence, it is returning zero. The reports are coming from 
the most recent BLAST release, but I've run into this same problem 
parsing reports from a couple point releases back.  I took a look at 
the raw BLAST files and have uncovered a pattern that is illustrated in 
the following 4 test cases. In each of the cases labeled 'FAILURE CASE' 
there is a blank line after the Query description before the length of 
the query is provided - these two results return a query_length of 0. 
Contrast this with the test cases labeled 'SUCCESS CASE' where the 
proper length is returned. Presumably, the extra white space is 
confusing the BLAST parser.

-FAILURE CASE 1-

Query= At2g02830.1 68409.m00200 retroelement pol polyprotein -related

          (104 letters)

Database: athrep.ref
            457 sequences; 1,462,624 total letters

Searching.done

                                                                    
Score     E
Sequences producing significant alignments:                        
(bits)  Value

ATCOPIA62_I                                                           
157  6e-41
ATCOPIA11I                                                            
102  4e-24
..

-FAILURE CASE 2-

Query= At2g04140.1 68409.m00353 retroelement pol polyprotein -related

          (88 letters)

Database: athrep.ref
            457 sequences; 1,462,624 total letters

Searching.done

                                                                    
Score     E
Sequences producing significant alignments:                        
(bits)  Value

META1_I                                                               
179  1e-47
ATCOPIA28_I                                                           
177  6e-47
..

-SUCCESS CASE 1-

Query= At2g01022.1 68409.m00001 polyprotein, putative similar to
polyprotein [Ananas comosus] GI:2995405; contains Pfam profile
PF00078: Reverse transcriptase (RNA-dependent DNA polymerase)
          (660 letters)

Database: athrep.ref
            457 sequences; 1,462,624 total letters

Searching.done

                                                                    
Score     E
Sequences producing significant alignments:                        
(bits)  Value

ATGP1I                                                               
1203  0.0
ATGP2I                                                                
870  0.0
..

-SUCCESS CASE 2-

Query= At2g03080.1 68409.m00227 reverse transcriptase -related
          (137 letters)

Database: athrep.ref
            457 sequences; 1,462,624 total letters

Searching.done

                                                                    
Score     E
Sequences producing significant alignments:                        
(bits)  Value

META1_I                                                               
231  5e-63
ATCOPIA28_I                                                           
229  3e-62
..

--
Matthew W. Vaughn, Ph.D.
Cold Spring Harbor Laboratory
Delbruck Laboratory / Martienssen Group
1 Bungtown Road
Cold Spring Harbor, NY 11724

phone: (516) 422-4128



More information about the Bioperl-l mailing list