[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output

Chris Fields cjfields at uiuc.edu
Sun Feb 12 22:30:07 UTC 2006


Sequences are converted to FASTA format in RemoteBlast using  
Bio::SeqIO, which I think includes IUPAC base and amino acid  
ambiguities like you mention, so my guess is any errors (like odd non- 
IUPAC letters in nucleotide or aa queries) are likely caught there.   
As long as it passes Bio::SeqIO it shouldn't be a problem.  Haven't  
tried this myself, though, so I can't say that with absolute certainty.

Chris



On Feb 12, 2006, at 2:05 PM, Phillip SanMiguel wrote:

> Roger,
> Just a data point, but in case you were not already aware of it, the
> characters W, K and R may be included in some DNA sequences. 'W' means
> 'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember
> correctly. These are ambiguous bases, where a basecaller isn't  
> sure, for
> example, whether a particular peak is an A or a T. Although I see  
> these
> ambiguous bases less frequently these days, even common modern
> basecallers (such as Applied Biosystems basecallers) can generally be
> configured so they will generate them. Downstream applications may not
> like them, however.
>     I may be just stating the obvious, or this might be irrelevant to
> the issue at hand. If so, my apologies.
>
> Phillip
> Roger Hall wrote:
>> Guys - I'm looking at the error message:
>>
>> MSG: no data for midline Query  1   WWWKWRW  7
>> STACK Bio::SearchIO::blast::next_result
>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>> STACK toplevel
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>> This is my line of thought:
>> 1. "no data for midline $_" is a unique message generated by  
>> blast.pm in one
>> location only at the point of a. reading three lines b. dropping  
>> lines with
>> spaces only c. identifying the Query, Midline, and Match lines (0  
>> <= $i < 3)
>> 2. There is a regexp match that fails in order to reach that error  
>> message
>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the  
>> expression
>> 4. It does anyway
>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the  
>> blast
>> reports
>>
>> I suspect a newline/chomp/metacharacter issue. Not finding the string
>> anywhere has me thoroughly confused - I asked Hubert for the  
>> additional
>> file, assuming that I didn't have it.
>>
>> My next thought is to write a quick script to test perl behavior  
>> on "Fedora
>> Core 9".
>>
>> Thoughts?
>>
>> Did I misread the issue entirely? :}
>>
>> Roger
>>
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
>> Fields
>> Sent: Thursday, February 09, 2006 10:16 AM
>> To: 'Jason Stajich'; 'Hubert Prielinger'
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing  
>> Blast
>> output
>>
>>
>>
>>> -----Original Message-----
>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>> Sent: Thursday, February 09, 2006 9:13 AM
>>> To: Hubert Prielinger
>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>> parsing Blast output
>>>
>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>
>>> working,
>>>
>>>> do you have any ohter idea, the problem I have is that I
>>>>
>>> have to parse
>>>
>>>> a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>> The code from Bioperl 1.5.1 works fine for me for blast
>>> 2.2.13 reports but unless you post your blast report we can't
>>> really determine the problem.
>>>
>>> If you are still getting the same error like this I am not
>>> convinced you have upgraded to 1.5.1 which includes a fix in
>>> the fact that NCBI changed the HSP result format to remove
>>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon
>>> as it was apparent sometime in September.
>>>
>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>>
>>>>>>
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> If you are just getting no results but also no warnings wrt
>>> parsing, are you sure your logic is correct?
>>>
>>> If you remove your filters do you see all the HSPS?
>>>
>>>
>>> while (my $result = $search->next_result) {
>>>      print $result->query_name, "\n";
>>>      #iterate over each hit on the query sequence
>>>      while (my $hit = $result->next_hit) {
>>> 	print $hit->name, "\n";
>>>          #iterate over each HSP in the hit
>>>          while (my $hsp = $hit->next_hsp) {
>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>> hit_string, "\n";	
>>>         }
>>>     }
>>> }
>>>
>>
>> I tested some of the BLAST results that Hubert sent Roger and me  
>> with a
>> similar script to the above.  I removed the file parsing logic and  
>> it seemed
>> to work just fine.  It may very well be a logic issue or that he  
>> hasn't
>> installed the latest fix.
>>
>> It's a funny thing, though.  When I tried using blastcl3 (v.  
>> 2.2.13), even
>> though the returned output was from nr, the top of the blast  
>> output showed
>> that it was v2.2.12:
>>
>> BLASTP 2.2.12 [Aug-07-2005]
>>
>> I double-checked my local version and it's definitely v.2.2.13:
>> -------------------------------------
>> C:\Perl\Scripts>blastcl3 -
>>
>> blastcl3 2.2.13   arguments:...
>> -------------------------------------
>>
>> If you use RemoteBlast using the same settings, the version in the  
>> header
>> looks like this:
>>
>> BLASTP 2.2.13 [Nov-27-2005]
>>
>> I'm wondering if all the blast executables (blast and netblast)  
>> from NCBI
>> have text output like v.2.2.12, while the wwwblast outputs a new  
>> format
>> (2.2.13).  I'll ask blast-help at NCBI about this.
>>
>>
>>> To clarify some stuff -
>>> Chris I don't necessarily think the XML is best way forward
>>> for BLAST reports generated locally, it isn't as detailed as
>>> the Text format and it is what most people expect to be able
>>> to scroll through and parse -- it is also harder for the
>>> format to change dramatically if you have a static binary on
>>> your machine =).  I think for remoteblast the XML format
>>> should be the way forward but I expect Bioperl to maintain
>>> support of any plain text BLAST report format that people use
>>> on a regular basis.
>>>
>>>
>>
>> Does XML lack some specific info that text output has?  Didn't  
>> know that.  I
>> believe that XML should be default in RemoteBlast since it will  
>> not break,
>> but I agree with you about text output.  I also agree that it will  
>> need
>> somebody to maintain it constantly, much like RemoteBlast.
>>
>>
>>> -jason
>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>> (1.5.1) or
>>>>> bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl 1.5.1,
>>>>> the last problem (more recent, not related to the first) has been
>>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed
>>>>> SearchIO::blast is available in the link above, but
>>>>>
>>> realize it hasn't
>>>
>>>>> been committed yet and may change.
>>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>> parsing Blast
>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>>
>>>>>>
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>
>>> parsing Blast
>>>
>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>
>>> bioperl version
>>>
>>>>>> I had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> http://www.duke.edu/~jes12
>>>
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list