[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast output

Joel Steele injunjoel at hotmail.com
Thu Feb 9 21:33:45 UTC 2006


Greetings again,
Its the colon...
observe.

-=Code Snippet=-
#!/usr/bin/perl -w
use strict;

#the string as reported from your error.
my $string1 = 'Query  1   WWWKWRW  7';

#your string with a colon thrown in for testing.
my $string2 = 'Query:  1   WWWKWRW  7';

foreach ($string1, $string2){
	if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){
		print "Match Found in $_\n";
		print $1."\n";
		print $2."\n";
		print $3."\n";
		print $4."\n";
		print $5."\n";
	}else{
		print "no Match for $_\n";
	}
}

-=End Code=-

The Output

-=Code Snippet=-
no Match for Query  1   WWWKWRW  7
Match Found in Query:  1   WWWKWRW  7
Query:  1
Query
1
WWWKWRW
7

-=End Code=-


Now I would suggest changing the regexp

From:
/^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/

To:
/^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/

in SearchIO::Blast.

General suggestion:
Again I would like to suggest that everyone get use to using the strict 
pragma. Though it may not applicable to this particular problem it becomes 
essential if you wish progress in your use of Perl.
It is a core module so there is nothing to download from CPAN. It helps with 
development and once your code can run without warnings and errors you can 
remove it. This is not a targeted attack as some may interpret it, rather a 
general FYI for those out there new to Perl or programming in general. 
Better to start learning the rules early before bad habits creep in.
One more thing. There is a wonderfully supportive Perl community available 
to anyone who wants to join at PerlMonks.org check it out, who knows you may 
even catch a glimpse of Larry Wall while youre there.

-Joel Steele

"The surest way to corrupt a youth is to instruct him to hold in higher 
regard those who think alike than those who think differently." -Nietzsche

"I do not feel obliged to believe that the same God who endowed us with 
sense, reason and intellect has intended us to forego their use." -Galileo




>From: Hubert Prielinger <hubert.prielinger at gmx.at>
>To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields 
><cjfields at uiuc.edu>,        Jason Stajich <jason.stajich at duke.edu>
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>parsingBlast	output
>Date: Thu, 09 Feb 2006 14:13:31 -0600
>MIME-Version: 1.0
>Received: from newportal.open-bio.org ([209.59.5.172]) by 
>bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 9 
>Feb 2006 13:14:17 -0800
>Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by 
>newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k19LAD2j009778;Thu, 9 
>Feb 2006 16:10:49 -0500
>Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by 
>newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for 
><bioperl-l at bioperl.org>; Thu, 9 Feb 2006 16:09:59 -0500
>Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000
>Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) 
>[136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006 22:10:05 
>+0100
>X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA=
>X-Authenticated: #16854991
>User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>X-Accept-Language: en-us, en
>References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL>
>X-Y-GMX-Trusted: 0
>X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 
>(newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08 -0500 (EST)
>X-Greylist: IP, sender and recipient auto-whitelisted, not delayed 
>bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Thu, 09 
>Feb 2006 16:09:59 -0500 (EST)
>X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>X-Scanned-By: MIMEDefang 2.52
>X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>X-BeenThere: bioperl-l at lists.open-bio.org
>X-Mailman-Version: 2.1.7
>Precedence: list
>List-Id: Bioperl Project Discussion List <bioperl-l.lists.open-bio.org>
>List-Unsubscribe: 
><http://lists.open-bio.org/mailman/listinfo/bioperl-l>,<mailto:bioperl-l-request at lists.open-bio.org?subject=unsubscribe>
>List-Archive: <http://lists.open-bio.org/pipermail/bioperl-l>
>List-Post: <mailto:bioperl-l at lists.open-bio.org>
>List-Help: <mailto:bioperl-l-request at lists.open-bio.org?subject=help>
>List-Subscribe: 
><http://lists.open-bio.org/mailman/listinfo/bioperl-l>,<mailto:bioperl-l-request at lists.open-bio.org?subject=subscribe>
>Errors-To: bioperl-l-bounces at lists.open-bio.org
>Return-Path: bioperl-l-bounces at lists.open-bio.org
>X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC) 
>FILETIME=[C95D94A0:01C62DBD]
>
>dear roger,
>this error message I got, when I tried to parse Blast output (version
>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot
>of Blast output files
>with version 2.2.13 and for that I don't get any error message.....it
>just doesn't work
>
>Hubert
>
>
>
>Roger Hall wrote:
>
> >Guys - I'm looking at the error message:
> >
> >MSG: no data for midline Query  1   WWWKWRW  7
> >STACK Bio::SearchIO::blast::next_result
> >/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >STACK toplevel
> >/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >
> >This is my line of thought:
> >1. "no data for midline $_" is a unique message generated by blast.pm in 
>one
> >location only at the point of a. reading three lines b. dropping lines 
>with
> >spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 
>3)
> >2. There is a regexp match that fails in order to reach that error 
>message
> >3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
> >4. It does anyway
> >5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
> >reports
> >
> >I suspect a newline/chomp/metacharacter issue. Not finding the string
> >anywhere has me thoroughly confused - I asked Hubert for the additional
> >file, assuming that I didn't have it.
> >
> >My next thought is to write a quick script to test perl behavior on 
>"Fedora
> >Core 9".
> >
> >Thoughts?
> >
> >Did I misread the issue entirely? :}
> >
> >Roger
> >
> >
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >Sent: Thursday, February 09, 2006 10:16 AM
> >To: 'Jason Stajich'; 'Hubert Prielinger'
> >Cc: bioperl-l at bioperl.org
> >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
> >output
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >>Sent: Thursday, February 09, 2006 9:13 AM
> >>To: Hubert Prielinger
> >>Cc: Chris Fields; bioperl-l at bioperl.org
> >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>parsing Blast output
> >>
> >>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> >>
> >>
> >>>hi chris,
> >>>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>>
> >>>
> >>working,
> >>
> >>
> >>>do you have any ohter idea, the problem I have is that I
> >>>
> >>>
> >>have to parse
> >>
> >>
> >>>a lot of textfiles....
> >>>or shall I look for another option to parse those files...
> >>>
> >>>regards
> >>>Hubert
> >>>
> >>>
> >>The code from Bioperl 1.5.1 works fine for me for blast
> >>2.2.13 reports but unless you post your blast report we can't
> >>really determine the problem.
> >>
> >>If you are still getting the same error like this I am not
> >>convinced you have upgraded to 1.5.1 which includes a fix in
> >>the fact that NCBI changed the HSP result format to remove
> >>the ':' from the Query/Sbjct prefixes.  We fixed this as soon
> >>as it was apparent sometime in September.
> >>
> >>
> >>
> >>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>STACK toplevel
> >>>>>
> >>>>>
> >>>>>
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>If you are just getting no results but also no warnings wrt
> >>parsing, are you sure your logic is correct?
> >>
> >>If you remove your filters do you see all the HSPS?
> >>
> >>
> >>while (my $result = $search->next_result) {
> >>     print $result->query_name, "\n";
> >>     #iterate over each hit on the query sequence
> >>     while (my $hit = $result->next_hit) {
> >>	print $hit->name, "\n";
> >>         #iterate over each HSP in the hit
> >>         while (my $hsp = $hit->next_hsp) {
> >>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
> >> >hit_string, "\n";
> >>        }
> >>    }
> >>}
> >>
> >>
> >
> >I tested some of the BLAST results that Hubert sent Roger and me with a
> >similar script to the above.  I removed the file parsing logic and it 
>seemed
> >to work just fine.  It may very well be a logic issue or that he hasn't
> >installed the latest fix.
> >
> >It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), 
>even
> >though the returned output was from nr, the top of the blast output 
>showed
> >that it was v2.2.12:
> >
> >BLASTP 2.2.12 [Aug-07-2005]
> >
> >I double-checked my local version and it's definitely v.2.2.13:
> >-------------------------------------
> >C:\Perl\Scripts>blastcl3 -
> >
> >blastcl3 2.2.13   arguments:...
> >-------------------------------------
> >
> >If you use RemoteBlast using the same settings, the version in the header
> >looks like this:
> >
> >BLASTP 2.2.13 [Nov-27-2005]
> >
> >I'm wondering if all the blast executables (blast and netblast) from NCBI
> >have text output like v.2.2.12, while the wwwblast outputs a new format
> >(2.2.13).  I'll ask blast-help at NCBI about this.
> >
> >
> >
> >>To clarify some stuff -
> >>Chris I don't necessarily think the XML is best way forward
> >>for BLAST reports generated locally, it isn't as detailed as
> >>the Text format and it is what most people expect to be able
> >>to scroll through and parse -- it is also harder for the
> >>format to change dramatically if you have a static binary on
> >>your machine =).  I think for remoteblast the XML format
> >>should be the way forward but I expect Bioperl to maintain
> >>support of any plain text BLAST report format that people use
> >>on a regular basis.
> >>
> >>
> >>
> >
> >Does XML lack some specific info that text output has?  Didn't know that. 
>  I
> >believe that XML should be default in RemoteBlast since it will not 
>break,
> >but I agree with you about text output.  I also agree that it will need
> >somebody to maintain it constantly, much like RemoteBlast.
> >
> >
> >
> >>-jason
> >>
> >>
> >>>Chris Fields wrote:
> >>>
> >>>
> >>>
> >>>>My guess is you're running into text parsing problems in
> >>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
> >>>>(1.5.1) or
> >>>>bioperl-live (CVS), then see the bug below.
> >>>>
> >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>
> >>>>I think the first problem you ran into is solved in bioperl 1.5.1,
> >>>>the last problem (more recent, not related to the first) has been
> >>>>fixed but hasn't been committed to bioperl-live yet.  The fixed
> >>>>SearchIO::blast is available in the link above, but
> >>>>
> >>>>
> >>realize it hasn't
> >>
> >>
> >>>>been committed yet and may change.
> >>>>
> >>>>Christopher Fields
> >>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
> >>>>University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>-----Original Message-----
> >>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
> >>>>>Prielinger
> >>>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>>To: bioperl-l at bioperl.org
> >>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>>
> >>>>>
> >>parsing Blast
> >>
> >>
> >>>>>output
> >>>>>
> >>>>>Hi,
> >>>>>If I want to parse a Blast Output (Version 2.2.12) with
> >>>>>Bio::SearchIO, I get the following error message:
> >>>>>
> >>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>STACK toplevel
> >>>>>
> >>>>>
> >>>>>
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>
> >>>>>is that a bug......
> >>>>>
> >>>>>If I want to parse Blast Output (version 2.2.13), I don't get
> >>>>>anything.....
> >>>>>I'm using bioperl 1.4
> >>>>>
> >>>>>before, I have installed bioperl 1.4, it worked fine
> >>>>>
> >>>>>
> >>parsing Blast
> >>
> >>
> >>>>>Output (version 2.2.12), but I don't remember which
> >>>>>
> >>>>>
> >>bioperl version
> >>
> >>
> >>>>>I had installed
> >>>>>
> >>>>>thanks in advance
> >>>>>
> >>>>>Hubert
> >>>>>
> >>>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>Bioperl-l mailing list
> >>>>>Bioperl-l at lists.open-bio.org
> >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at lists.open-bio.org
> >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>--
> >>Jason Stajich
> >>Duke University
> >>http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list