[Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blastoutput

Chris Fields cjfields at uiuc.edu
Thu Feb 9 05:07:15 UTC 2006


On Feb 8, 2006, at 6:54 PM, Joel Steele wrote:

> Greetings,
> Im not well versed in Bio::SearchIO but there are a few comments  
> about your
> code that may or may not be relevant...
>
> first thing:
>
> =-=-=-=-=code snippet=-=-=-=-=
>
> #!/usr/bin/perl -w
> use strict;   #save yourself the headaches and force yourself to  
> write clean
> code.
>
> =-=-=-=-=code snippet=-=-=-=-=
>

Tread very carefully here.  Just about every book on perl suggests  
'use strict' and adding warnings for code development (ex. the Camel,  
the Llama, and others); in fact, these are the very books most  
beginners start from.  Some would consider NOT using -w or 'use  
strict' a bad habit; everybody has an opinion (I would repeat an oft- 
heard Texas saying, but I'll refrain).  Just remember: try to be a  
little more constructive in your critique and insert a little less  
about your personal coding style.  If you hit the wrong person, you  
might get flamed.

Here's a link that may help a bit here:

http://bioperl.org/Core/Latest/ 
biodesign.html#respect_people_s_code__in_particular_if_it_works_

> next thing:
> when you are reading the files from the directory you are not doing  
> any sort
> of filtering as to what is returned. If you are on a Unix flavored  
> system
> you may be getting the '.' and '..' entries from your readdir(DIR)  
> call. I
> would suggest placing a grep in there somewhere to get only blast  
> files.
> something like:
>

I agree here.  You could probably also use something like File::Find  
here to make things a bit easier with the file names as well; works  
wonderfully, esp. when traversing a directory tree.

> =-=-=-=-=code snippet=-=-=-=-=
>
> #assuming the file extension for blast files is .bls
> #the -e and -f are filetests; you could probably get away with just
> #-f. Here is a link for reference on the filetests available in Perl.
> #
> # http://www.perlmonks.org/?node_id=370
>
> my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR);
> closedir(DIR);
>
> #then proceed with your foreach but over @files_to_parse
>
> foreach my $file(@files_to_parse){
>      #do cool stuff here...
> }
>

Again, agreed.  But, does it really solve the main problem, which is  
an issue with SearchIO::blast?  It seemed to try parsing a blast file...

> =-=-=-=-=code snippet=-=-=-=-=
>
> Hope that helps.
> -Joel Steele
>
>
> "The surest way to corrupt a youth is to instruct him to hold in  
> higher
> regard those who think alike than those who think differently." - 
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us  
> with
> sense, reason and intellect has intended us to forego their use." - 
> Galileo
>
>
>
>
>> From: Hubert Prielinger <hubert.prielinger at gmx.at>
>> To: Chris Fields <cjfields at uiuc.edu>, bioperl-l at bioperl.org,
>> rahall2 at ualr.edu
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
>> Blastoutput
>> Date: Wed, 08 Feb 2006 16:22:44 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC 
>> (6.0.3790.211); Wed, 8
>> Feb 2006 15:21:55 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain  
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id  
>> k18NKjCX009295;Wed, 8
>> Feb 2006 18:20:53 -0500
>> Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for
>> <bioperl-l at bioperl.org>; Wed, 8 Feb 2006 18:20:43 -0500
>> Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006  
>> 00:19:21
>> +0100
>> X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <001201c62d03$703178c0$15327e82 at pyrimidine>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter- 
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21  
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org  
>> [207.154.17.70]);Wed, 08
>> Feb 2006 18:20:43 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List <bioperl-l.lists.open- 
>> bio.org>
>> List-Unsubscribe:
>> <http://lists.open-bio.org/mailman/listinfo/bioperl- 
>> l>,<mailto:bioperl-l-request at lists.open-bio.org?subject=unsubscribe>
>> List-Archive: <http://lists.open-bio.org/pipermail/bioperl-l>
>> List-Post: <mailto:bioperl-l at lists.open-bio.org>
>> List-Help: <mailto:bioperl-l-request at lists.open-bio.org?subject=help>
>> List-Subscribe:
>> <http://lists.open-bio.org/mailman/listinfo/bioperl- 
>> l>,<mailto:bioperl-l-request at lists.open-bio.org?subject=subscribe>
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC)
>> FILETIME=[7419CF20:01C62D06]
>>
>> hi,
>> I have installed from the following page:
>> http://news.open-bio.org/archives/2005_10.html,  the Core, Run and  
>> Ext.
>> I'm using only the SearchIO without remoteblast module, because I  
>> have
>> already all my Blast output files.
>> My operating system is fedora core 9.
>>
>> Code:
>>
>> #!/usr/bin/perl -w
>>
>> use Bio::SearchIO;
>>
>> print "start program\n";
>> my $directory =
>> "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
>> opendir(DIR, $directory) || die("Cannot open directory");
>> print "opened directory\n";
>>
>> foreach my $file (readdir(DIR))  {
>> print "read file\n";
>>
>> my $search = new Bio::SearchIO (-format => 'blast',
>>                                 -file => $file);
>>
>> my $cutoff_len = 10;
>>
>>
>>
>> #iterate over each query sequence
>> while (my $result = $search->next_result) {
>> print "entered 1st while loop\n";
>>
>>     #iterate over each hit on the query sequence
>>     while (my $hit = $result->next_hit) {
>>
>>         #iterate over each HSP in the hit
>>         while (my $hsp = $hit->next_hsp) {
>>
>>             if ($hsp->length('sbjct') <= $cutoff_len) {
>>                 #print $hsp->hit_string, "\n";
>>                 for ($hsp->hit_string) {
>>
>>
>>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>> tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>>
>>                         # Print some tab-delimited data about this  
>> HSP
>>
>>                            open (bigShot,  
>> ">>BlastOutputTrial.txt") ||
>> die ("Could not open file. $!");
>>                                 #print $result->query_name, "\t";
>>
>> #                        print $hit->significance, "\t";
>>                          print bigShot $hit->name, "-->";
>>                          print bigShot $hit->description, "\n";
>>                          #print bigShot "Query:   ",
>> $hsp->start('query'), "  ", $hsp->query_string, "  ",
>> $hsp->end('query'), "\n";
>>                          print bigShot "Seq:     ", $hsp->start 
>> ('hit'),
>> "  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
>>
>> #                        print $hsp->rank, "\t";
>> #                        print $hsp->percent_identity, "\t";
>> #                        print $hsp->evalue, "\t";
>> #                        print $hsp->hsp_length, "\n";
>>
>>                         close (bigShot);
>>
>>                     };
>>
>>
>>             }
>>         }
>>         }
>>     }
>> }
>>
>> }
>>
>> closedir(DIR);
>>
>>
>> Chris Fields wrote:
>>
>>> Make sure you ran a full installation of bioperl-1.5.1 or bioperl- 
>>> live
>> (not
>>> just the modules you want; mixing bioperl versions might work,  
>>> but you
>> might
>>> run into interoperability problems).  Then replace the
>> Bio::SearchIO::blast
>>> with the one in Bugzilla.  The 'other option' you mentioned might be
>> trying
>>> XML instead of text, which is more stable in the long run.  You will
>> still
>>> need to run a full upgrade to bioperl 1.5.1 for that; make sure  
>>> you read
>>> this:
>>>
>>> http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>>>
>>> If you're using SearchIO directly instead of Remoteblast, you  
>>> should be
>> able
>>> to set the '-readmethod' flag to 'blastxml'.
>>>
>>> It also wouldn't hurt to know what OS you're using or see some code.
>> Roger
>>> is out there somewhere (I think) and may also have some input.
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at]
>>>> Sent: Wednesday, February 08, 2006 3:41 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>> working, do you have any ohter idea, the problem I have is
>>>> that I have to parse a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer
>>>>>
>>>>>
>>>> version (1.5.1)
>>>>
>>>>
>>>>> or bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl
>>>>>
>>>>>
>>>> 1.5.1, the
>>>>
>>>>
>>>>> last problem (more recent, not related to the first) has
>>>>>
>>>>>
>>>> been fixed but
>>>>
>>>>
>>>>> hasn't been committed to bioperl-live yet.  The fixed
>>>>>
>>>>>
>>>> SearchIO::blast
>>>>
>>>>
>>>>> is available in the link above, but realize it hasn't been
>>>>>
>>>>>
>>>> committed yet and may change.
>>>>
>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab
>>>>> Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>
>>>>>>
>>>> Bio::SearchIO,
>>>>
>>>>
>>>>>> I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>> Blast.pl:21
>>>>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine parsing  
>>>>>> Blast
>>>>>> Output (version 2.2.12), but I don't remember which bioperl
>>>>>>
>>>>>>
>>>> version I
>>>>
>>>>
>>>>>> had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list