[Bioperl-l] PSI-BLAST uncommon result

Luis M Rodriguez-R me at miguel.weapps.com
Thu Mar 11 00:48:17 EST 2010


Hello all,

I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round.  Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception:
MSG: no data for midline  ***** No hits found ******
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357
STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792
My workaround was to use the XML output, but it's still a bug (I think).  I append the example PSI-BLAST output at the end of the mail.

Best regards,

Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinformática del Laboratorio de Micología y Fitopatología
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]

+ 57 1 3394949 ext 2619
luisrodr at uniandes.edu.co
me at miguel.weapps.com


BLASTP 2.2.18 [Mar-02-2008]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.


Reference for compositional score matrix adjustment: Altschul, Stephen F., 
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.


Reference for composition-based statistics starting in round 2:
Schaffer, Alejandro A., L. Aravind, Thomas L. Madden,
Sergei Shavirin, John L. Spouge, Yuri I. Wolf,  
Eugene V. Koonin, and Stephen F. Altschul (2001), 
"Improving the accuracy of PSI-BLAST protein database searches with 
composition-based statistics and other refinements",  Nucleic Acids Res. 29:2994-3005.

Query= eff254
         (67 letters)

Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding environmental samples
from WGS projects 
           10,383,435 sequences; 3,542,477,638 total letters

Searching..................................................done


Results from round 1


                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se...   127   5e-28

>ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation
          pathway-hrp pilin [Erwinia pyrifoliae Ep1/96]
 sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin
          hrpA
 gb|ABA39805.1| HrpA [Erwinia pyrifoliae]
 emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation
          pathway-hrp pilin [Erwinia pyrifoliae Ep1/96]
 emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM
          12163]
          Length = 67

 Score =  127 bits (318), Expect = 5e-28,   Method: Compositional matrix adjust.
 Identities = 67/67 (100%), Positives = 67/67 (100%)

Query: 1  MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60
          MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN
Sbjct: 1  MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60

Query: 61 AAKAIQF 67
          AAKAIQF
Sbjct: 61 AAKAIQF 67


Searching..................................................done



 ***** No hits found ******

  Database: All non-redundant GenBank CDS
  translations+PDB+SwissProt+PIR+PRF excluding environmental samples
  from WGS projects
    Posted date:  Jan 24, 2010  4:41 AM
  Number of letters in database: 863,709,833
  Number of sequences in database:  2,562,282
  
  Database: /storage1/databases/ncbi-blast/nr.01
    Posted date:  Jan 24, 2010  4:41 AM
  Number of letters in database: 936,189,781
  Number of sequences in database:  2,674,439
  
  Database: /storage1/databases/ncbi-blast/nr.02
    Posted date:  Jan 24, 2010  4:41 AM
  Number of letters in database: 974,890,473
  Number of sequences in database:  2,826,395
  
  Database: /storage1/databases/ncbi-blast/nr.03
    Posted date:  Jan 24, 2010  4:41 AM
  Number of letters in database: 767,687,551
  Number of sequences in database:  2,320,319
  
Lambda     K      H
   0.297    0.107    0.256 

Lambda     K      H
   0.267   0.0344    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 480,706,425
Number of Sequences: 10383435
Number of extensions: 8598061
Number of successful extensions: 47335
Number of sequences better than 1.0e-25: 1
Number of HSP's better than  0.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 47333
Number of HSP's gapped (non-prelim): 2
length of query: 67
length of database: 3,542,477,638
effective HSP length: 39
effective length of query: 28
effective length of database: 3,137,523,673
effective search space: 87850662844
effective search space used: 87850662844
T: 11
A: 40
X1: 16 ( 6.9 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 43 (21.7 bits)
S2: 298 (119.7 bits)




More information about the Bioperl-l mailing list