[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output

Chris Fields cjfields at uiuc.edu
Fri Feb 10 01:52:34 UTC 2006


 From 'perldoc Bio::SearchIO::blast':

DESCRIPTION
        This object encapsulated the necessary methods for generating  
events
        suitable for building Bio::Search objects from a BLAST report  
file.
        Read the Bio::SearchIO for more information about how to use  
this.

        This driver can parse:

        o   NCBI produced plain text BLAST reports from blastall,  
this also
            includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
reports.  NCBI
            XML BLAST output is parsed with the blastxml SearchIO driver

        o   WU-BLAST all reports

        o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
BLAT)

        o   BLAST-like output from Paracel BTK output

So, it should.  Let us know if it doesn't.

On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:

> Hi Chris,
> I'm incredibly sorry for causing so much inconvenience, yes you are  
> right, I had only to change the blast.pm file, it is working very  
> fine, thank you very much, and you are right, you have mentioned it  
> ealier either to change the file... ;)
>
> but I have another question: does it work with the WU-Blast output  
> too?
> regards
> Hubert
>
>
> Chris Fields wrote:
>
>> Ha!  I come back from meeting and there's a billion emails!  What  
>> have we
>> started? ;p .  Sorry about this Jason; I know you're busy.
>>
>> Hubert, if you're out there, I sent you an email with an  
>> attachment.  You
>> said the output looks like what you were expecting.  So I think we  
>> have two
>> problems:
>>
>> 1)  I haven't delved into the file scanning, but the fact that it  
>> takes so
>> long should tell you something's seriously wrong there.  Strip  
>> that part out
>> and start with a simple script, say, like the one Jason or that I  
>> sent you;
>> the script I used to generate that output works fine (on two OS's,  
>> WinXP and
>> Mac OS X).  Use it on one file at a time.  Do everything on  
>> command line
>> (not through Eclipse).  IDE's can be notoriously flaky about running
>> scripts, esp. when they run debugging.
>> 2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>> will still
>> not work whenever the text blast output has the following header,  
>> which
>> comes from the new web version of BLAST:
>>
>> -----------------------------------------------------
>> BLASTP 2.2.13 [Nov-27-2005]
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>> protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>
>> RID: 1139501210-857-165793005128.BLASTQ1
>>
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>           3,292,813 sequences; 1,128,164,434 total letters
>> Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>> tuberculosis H37Rv].
>> Length=193
>> .......
>> -----------------------------------------------------
>>
>> It will work if the text output has the following header (or is an  
>> older
>> version of BLAST):
>>
>> -----------------------------------------------------
>> BLASTP 2.2.12 [Aug-07-2005]
>>
>>
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>> protein database search
>> programs",  Nucleic Acids Res. 25:3389-3402.
>>
>> Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>> tuberculosis H37Rv].
>>         (193 letters)
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>           2,895,325 sequences; 997,103,285 total letters
>> -----------------------------------------------------
>> You have the former (2.2.13) version.  I know b/c I have your  
>> BLAST files.
>> Therefore, even bioperl-1.5.1 will not work!
>>
>> If you want the really gory details on why this is a problem, look  
>> here:
>>
>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>
>> So, any text output with the above header will not work; it will  
>> either hang
>> or end abruptly (depending on OS, perl version, memory,  
>> patience).  If you
>> look in the above, I have added a preliminary fix for this.  I'll  
>> reiterate
>> for the billionth time, it hasn't been committed yet, so don't  
>> kill me if
>> blows your computer up ;>
>> Here's the direct link:
>> http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>> This is a modified version of Bio::SearchIO::blast.pm (it says  
>> it's version
>> 1.90, but it's lying, I didn't change the version, only the regex;  
>> sorry
>> Jason).  From what you've been posting it doesn't sound like  
>> you've tried
>> this, and I believe I've suggested this fix before.
>>
>> Replace the one in your Bio/SearchIO directory (which looks like
>> '/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>> prev.
>> message) with this file.  Make sure the filename stays the same  
>> (blast.pm).
>>
>> Run everything again, one file at a time.  Make sure you use  
>> Jason's script
>> as well as the one I sent you.  Do NOT rely on running through  
>> multiple
>> files yet.  Fix one bug at a time.  And heed Joel's words about  
>> file checks.
>>
>>
>> Here's a small chunk of output from one of your blast files using the
>> modifed script I sent you:
>>
>> sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>> Query:   1  RWKWKRKK  8
>> Seq:     542  RWAWRRKK  549
>>
>> Look familiar?
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>> -----Original Message-----
>>> From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>> February 09, 2006 3:24 PM
>>> To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast output
>>>
>>> In other words, yes, I'm on the wrong trail. :}
>>>
>>> Sorry - I'll look at the output issue this evening (or realize  
>>> that Chris already solved the issue).  ;}
>>>
>>> Thanks!
>>>
>>> Roger
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>> Prielinger
>>> Sent: Thursday, February 09, 2006 2:14 PM
>>> To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>> Stajich
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast output
>>>
>>> dear roger,
>>> this error message I got, when I tried to parse Blast output  
>>> (version
>>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>> a lot of Blast output files with version 2.2.13 and for that I  
>>> don't get any error message.....it just doesn't work
>>>
>>> Hubert
>>>
>>>
>>>
>>> Roger Hall wrote:
>>>
>>>
>>>> Guys - I'm looking at the error message:
>>>>
>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>> STACK Bio::SearchIO::blast::next_result
>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>> STACK toplevel
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>> This is my line of thought:
>>>> 1. "no data for midline $_" is a unique message generated by
>>> blast.pm
>>>> in
>>>>
>>> one
>>>
>>>> location only at the point of a. reading three lines b.
>>> dropping lines
>>>> with spaces only c. identifying the Query, Midline, and
>>> Match lines (0
>>>> <= $i <
>>>>
>>> 3)
>>>
>>>> 2. There is a regexp match that fails in order to reach that
>>> error message
>>>
>>>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>> expression
>>>
>>>> 4. It does anyway
>>>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>> in the blast
>>>
>>>> reports
>>>>
>>>> I suspect a newline/chomp/metacharacter issue. Not finding
>>> the string
>>>> anywhere has me thoroughly confused - I asked Hubert for the
>>> additional
>>>> file, assuming that I didn't have it.
>>>>
>>>> My next thought is to write a quick script to test perl behavior  
>>>> on "Fedora Core 9".
>>>>
>>>> Thoughts?
>>>>
>>>> Did I misread the issue entirely? :}
>>>>
>>>> Roger
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Chris Fields
>>>
>>>> Sent: Thursday, February 09, 2006 10:16 AM
>>>> To: 'Jason Stajich'; 'Hubert Prielinger'
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>> parsing Blast output
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>> Sent: Thursday, February 09, 2006 9:13 AM
>>>>> To: Hubert Prielinger
>>>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>> parsing Blast output
>>>>>
>>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>> hi chris,
>>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>
>>>>>>
>>>>> working,
>>>>>
>>>>>
>>>>>> do you have any ohter idea, the problem I have is that I
>>>>>>
>>>>>>
>>>>> have to parse
>>>>>
>>>>>
>>>>>> a lot of textfiles....
>>>>>> or shall I look for another option to parse those files...
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>> The code from Bioperl 1.5.1 works fine for me for blast
>>>>> 2.2.13 reports but unless you post your blast report we
>>> can't really
>>>>> determine the problem.
>>>>>
>>>>> If you are still getting the same error like this I am not
>>> convinced
>>>>> you have upgraded to 1.5.1 which includes a fix in the fact
>>> that NCBI
>>>>> changed the HSP result format to remove the ':' from the
>>> Query/Sbjct
>>>>> prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>> September.
>>>>>
>>>>>
>>>>>
>>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>> STACK toplevel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>> Blast.pl:21
>>>>>
>>>>> If you are just getting no results but also no warnings wrt
>>> parsing,
>>>>> are you sure your logic is correct?
>>>>>
>>>>> If you remove your filters do you see all the HSPS?
>>>>>
>>>>>
>>>>> while (my $result = $search->next_result) {
>>>>>    print $result->query_name, "\n";
>>>>>    #iterate over each hit on the query sequence
>>>>>    while (my $hit = $result->next_hit) {
>>>>> 	print $hit->name, "\n";
>>>>>        #iterate over each HSP in the hit
>>>>>        while (my $hsp = $hit->next_hsp) {
>>>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>
>>>>>> hit_string, "\n";	
>>>>>>
>>>>>       }
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>> I tested some of the BLAST results that Hubert sent Roger
>>> and me with a
>>>> similar script to the above.  I removed the file parsing logic  
>>>> and it
>>>>
>>> seemed
>>>
>>>> to work just fine.  It may very well be a logic issue or
>>> that he hasn't
>>>> installed the latest fix.
>>>>   It's a funny thing, though.  When I tried using blastcl3 (v.
>>> 2.2.13),
>>>> even though the returned output was from nr, the top of the  
>>>> blast output showed that it was v2.2.12:
>>>>
>>>> BLASTP 2.2.12 [Aug-07-2005]
>>>>
>>>> I double-checked my local version and it's definitely v.2.2.13:
>>>> -------------------------------------
>>>> C:\Perl\Scripts>blastcl3 -
>>>>
>>>> blastcl3 2.2.13   arguments:...
>>>> -------------------------------------
>>>>
>>>> If you use RemoteBlast using the same settings, the version in  
>>>> the header looks like this:
>>>>
>>>> BLASTP 2.2.13 [Nov-27-2005]
>>>>
>>>> I'm wondering if all the blast executables (blast and netblast)  
>>>> from NCBI have text output like v.2.2.12, while the wwwblast
>>> outputs a new
>>>> format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>
>>>>
>>>>
>>>>> To clarify some stuff -
>>>>> Chris I don't necessarily think the XML is best way forward
>>> for BLAST
>>>>> reports generated locally, it isn't as detailed as the Text
>>> format and
>>>>> it is what most people expect to be able to scroll through
>>> and parse
>>>>> -- it is also harder for the format to change dramatically        
>>> if you have
>>>>> a static binary on your machine =).  I think for
>>> remoteblast the XML
>>>>> format should be the way forward but I expect Bioperl to  
>>>>> maintain support of any plain text BLAST report format that  
>>>>> people use on a regular basis.
>>>>>
>>>>>
>>>>>
>>>> Does XML lack some specific info that text output has?
>>> Didn't know that.
>>> I
>>>
>>>> believe that XML should be default in RemoteBlast since it will  
>>>> not break, but I agree with you about text output.  I also agree  
>>>> that it will need somebody to maintain it constantly, much like  
>>>> RemoteBlast.
>>>>
>>>>
>>>>
>>>>> -jason
>>>>>
>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My guess is you're running into text parsing problems in  
>>>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>> (1.5.1) or
>>>>>>> bioperl-live (CVS), then see the bug below.
>>>>>>>
>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>
>>>>>>> I think the first problem you ran into is solved in
>>> bioperl 1.5.1,
>>>>>>> the last problem (more recent, not related to the first) has  
>>>>>>> been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>> The fixed SearchIO::blast is available in the link above, but
>>>>>>>
>>>>>>>
>>>>> realize it hasn't
>>>>>
>>>>>
>>>>>>> been committed yet and may change.
>>>>>>>
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>> Of Hubert
>>>>>>>> Prielinger
>>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>> To: bioperl-l at bioperl.org
>>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>
>>>>>>>>
>>>>> parsing Blast
>>>>>
>>>>>
>>>>>>>> output
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>>>
>>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>> STACK toplevel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>> Blast.pl:21
>>>>>
>>>>>
>>>>>>>> is that a bug......
>>>>>>>>
>>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>> get anything.....
>>>>>>>> I'm using bioperl 1.4
>>>>>>>>
>>>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>>>
>>>>>>>>
>>>>> parsing Blast
>>>>>
>>>>>
>>>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>>>
>>>>>>>>
>>>>> bioperl version
>>>>>
>>>>>
>>>>>>>> I had installed
>>>>>>>>
>>>>>>>> thanks in advance
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> --
>>>>> Jason Stajich
>>>>> Duke University
>>>>> http://www.duke.edu/~jes12
>>>>>
>>>>>
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list