[Bioperl-l] PSI-BLAST uncommon result
Chris Fields
cjfields at illinois.edu
Thu Mar 11 09:27:33 EST 2010
Luis,
The best way to handle this is to attach the problematic report (not append it) to a bug report on bugzilla. This ensures we aren't running into artifacts generated via the email client, etc.
chris
On Mar 10, 2010, at 11:48 PM, Luis M Rodriguez-R wrote:
> Hello all,
>
> I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception:
> MSG: no data for midline ***** No hits found ******
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357
> STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792
> My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail.
>
> Best regards,
>
> Luis M. Rodriguez-R
> [http://bioinf.uniandes.edu.co/~miguel/]
> ---------------------------------
> Unidad de Bioinformática del Laboratorio de Micología y Fitopatología
> Universidad de Los Andes, Colombia
> [http://bioinf.uniandes.edu.co]
>
> + 57 1 3394949 ext 2619
> luisrodr at uniandes.edu.co
> me at miguel.weapps.com
>
>
> BLASTP 2.2.18 [Mar-02-2008]
>
>
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs", Nucleic Acids Res. 25:3389-3402.
>
>
> Reference for compositional score matrix adjustment: Altschul, Stephen F.,
> John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
> Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
> using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.
>
>
> Reference for composition-based statistics starting in round 2:
> Schaffer, Alejandro A., L. Aravind, Thomas L. Madden,
> Sergei Shavirin, John L. Spouge, Yuri I. Wolf,
> Eugene V. Koonin, and Stephen F. Altschul (2001),
> "Improving the accuracy of PSI-BLAST protein database searches with
> composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
>
> Query= eff254
> (67 letters)
>
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> from WGS projects
> 10,383,435 sequences; 3,542,477,638 total letters
>
> Searching..................................................done
>
>
> Results from round 1
>
>
> Score E
> Sequences producing significant alignments: (bits) Value
>
> ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28
>
>> ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation
> pathway-hrp pilin [Erwinia pyrifoliae Ep1/96]
> sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin
> hrpA
> gb|ABA39805.1| HrpA [Erwinia pyrifoliae]
> emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation
> pathway-hrp pilin [Erwinia pyrifoliae Ep1/96]
> emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM
> 12163]
> Length = 67
>
> Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust.
> Identities = 67/67 (100%), Positives = 67/67 (100%)
>
> Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60
> MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN
> Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60
>
> Query: 61 AAKAIQF 67
> AAKAIQF
> Sbjct: 61 AAKAIQF 67
>
>
> Searching..................................................done
>
>
>
> ***** No hits found ******
>
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
> from WGS projects
> Posted date: Jan 24, 2010 4:41 AM
> Number of letters in database: 863,709,833
> Number of sequences in database: 2,562,282
>
> Database: /storage1/databases/ncbi-blast/nr.01
> Posted date: Jan 24, 2010 4:41 AM
> Number of letters in database: 936,189,781
> Number of sequences in database: 2,674,439
>
> Database: /storage1/databases/ncbi-blast/nr.02
> Posted date: Jan 24, 2010 4:41 AM
> Number of letters in database: 974,890,473
> Number of sequences in database: 2,826,395
>
> Database: /storage1/databases/ncbi-blast/nr.03
> Posted date: Jan 24, 2010 4:41 AM
> Number of letters in database: 767,687,551
> Number of sequences in database: 2,320,319
>
> Lambda K H
> 0.297 0.107 0.256
>
> Lambda K H
> 0.267 0.0344 0.140
>
>
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Hits to DB: 480,706,425
> Number of Sequences: 10383435
> Number of extensions: 8598061
> Number of successful extensions: 47335
> Number of sequences better than 1.0e-25: 1
> Number of HSP's better than 0.0 without gapping: 2
> Number of HSP's successfully gapped in prelim test: 0
> Number of HSP's that attempted gapping in prelim test: 47333
> Number of HSP's gapped (non-prelim): 2
> length of query: 67
> length of database: 3,542,477,638
> effective HSP length: 39
> effective length of query: 28
> effective length of database: 3,137,523,673
> effective search space: 87850662844
> effective search space used: 87850662844
> T: 11
> A: 40
> X1: 16 ( 6.9 bits)
> X2: 38 (14.6 bits)
> X3: 64 (24.7 bits)
> S1: 43 (21.7 bits)
> S2: 298 (119.7 bits)
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list