[Bioperl-l] PSI-BLAST uncommon result

Chris Fields cjfields at illinois.edu
Thu Mar 11 09:27:33 EST 2010


Luis,

The best way to handle this is to attach the problematic report (not append it) to a bug report on bugzilla.  This ensures we aren't running into artifacts generated via the email client, etc.

chris

On Mar 10, 2010, at 11:48 PM, Luis M Rodriguez-R wrote:

> Hello all,
> 
> I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round.  Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception:
> MSG: no data for midline  ***** No hits found ******
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357
> STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792
> My workaround was to use the XML output, but it's still a bug (I think).  I append the example PSI-BLAST output at the end of the mail.
> 
> Best regards,
> 
> Luis M. Rodriguez-R
> [http://bioinf.uniandes.edu.co/~miguel/]
> ---------------------------------
> Unidad de Bioinformática del Laboratorio de Micología y Fitopatología
> Universidad de Los Andes, Colombia
> [http://bioinf.uniandes.edu.co]
> 
> + 57 1 3394949 ext 2619
> luisrodr at uniandes.edu.co
> me at miguel.weapps.com
> 
> 
> BLASTP 2.2.18 [Mar-02-2008]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> 
> Reference for compositional score matrix adjustment: Altschul, Stephen F., 
> John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
> Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
> using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.
> 
> 
> Reference for composition-based statistics starting in round 2:
> Schaffer, Alejandro A., L. Aravind, Thomas L. Madden,
> Sergei Shavirin, John L. Spouge, Yuri I. Wolf,  
> Eugene V. Koonin, and Stephen F. Altschul (2001), 
> "Improving the accuracy of PSI-BLAST protein database searches with 
> composition-based statistics and other refinements",  Nucleic Acids Res. 29:2994-3005.
> 
> Query= eff254
>       (67 letters)
> 
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> from WGS projects 
>         10,383,435 sequences; 3,542,477,638 total letters
> 
> Searching..................................................done
> 
> 
> Results from round 1
> 
> 
>                                                               Score    E
> Sequences producing significant alignments:                      (bits) Value
> 
> ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se...   127   5e-28
> 
>> ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation
>        pathway-hrp pilin [Erwinia pyrifoliae Ep1/96]
> sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin
>        hrpA
> gb|ABA39805.1| HrpA [Erwinia pyrifoliae]
> emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation
>        pathway-hrp pilin [Erwinia pyrifoliae Ep1/96]
> emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM
>        12163]
>        Length = 67
> 
> Score =  127 bits (318), Expect = 5e-28,   Method: Compositional matrix adjust.
> Identities = 67/67 (100%), Positives = 67/67 (100%)
> 
> Query: 1  MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60
>        MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN
> Sbjct: 1  MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60
> 
> Query: 61 AAKAIQF 67
>        AAKAIQF
> Sbjct: 61 AAKAIQF 67
> 
> 
> Searching..................................................done
> 
> 
> 
> ***** No hits found ******
> 
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> from WGS projects
>  Posted date:  Jan 24, 2010  4:41 AM
> Number of letters in database: 863,709,833
> Number of sequences in database:  2,562,282
> 
> Database: /storage1/databases/ncbi-blast/nr.01
>  Posted date:  Jan 24, 2010  4:41 AM
> Number of letters in database: 936,189,781
> Number of sequences in database:  2,674,439
> 
> Database: /storage1/databases/ncbi-blast/nr.02
>  Posted date:  Jan 24, 2010  4:41 AM
> Number of letters in database: 974,890,473
> Number of sequences in database:  2,826,395
> 
> Database: /storage1/databases/ncbi-blast/nr.03
>  Posted date:  Jan 24, 2010  4:41 AM
> Number of letters in database: 767,687,551
> Number of sequences in database:  2,320,319
> 
> Lambda     K      H
> 0.297    0.107    0.256 
> 
> Lambda     K      H
> 0.267   0.0344    0.140 
> 
> 
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Hits to DB: 480,706,425
> Number of Sequences: 10383435
> Number of extensions: 8598061
> Number of successful extensions: 47335
> Number of sequences better than 1.0e-25: 1
> Number of HSP's better than  0.0 without gapping: 2
> Number of HSP's successfully gapped in prelim test: 0
> Number of HSP's that attempted gapping in prelim test: 47333
> Number of HSP's gapped (non-prelim): 2
> length of query: 67
> length of database: 3,542,477,638
> effective HSP length: 39
> effective length of query: 28
> effective length of database: 3,137,523,673
> effective search space: 87850662844
> effective search space used: 87850662844
> T: 11
> A: 40
> X1: 16 ( 6.9 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 43 (21.7 bits)
> S2: 298 (119.7 bits)
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list