[Bioperl-l] RemoteBlast.pm problem resolved!!!!!

Chris Fields cjfields at uiuc.edu
Thu Jan 19 11:26:13 EST 2006


This resolves the problem only if you use bioperl 1.5.1.  RemoteBlast.pm was
changed ~fall 2005 and removed the $size variable (as reported here:
http://bugzilla.bioperl.org/show_bug.cgi?id=1864).

The text output will save if you use Search::IO.  However, parsing text
output seems to be broken using SearchIO at the moment, likely due to
modifications in output that probably broke SearchIO::blast.

Jason addresses this in the last few emails in this thread.  If you plan on
parsing out data (like accessions or HSP's) from BLAST output, then you may
have to switch to XML as text or HTML parsing can break at any time.  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org [mailto:bioperl-l-
> bounces at portal.open-bio.org] On Behalf Of Nagesh
> Sent: Thursday, January 19, 2006 1:22 AM
> To: Barry Moore; bioperl-l at bioperl.org
> Cc: ganesh.b.chakka at jpmorgan.com
> Subject: Re: [Bioperl-l] RemoteBlast.pm problem resolved!!!!!
> 
> Hi Barry,
> Thanks once again for an elaborate mail and explanation. I am using the
> latest version of BioPerl 1.5. I also tested this problem on 1.4 with no
> difference. The problem is with the "$rc = $factory->retrieve_blast
> ($rid);" where $rc was always getting an object as a return from
> retrieve_blast and is never entering into sleep 5 mode (the condition
> "if( !ref($rc) )" is never satisfied).
> 
> I thought I will have a look at the RemoteBlast.pm code once before
> trying anything more. I looked at the method retrieve_blast which was
> the main culprit and then found a possible answer for my problem. I
> looked at the condition which returns 0, -1 or an object which is below
> 
> Code from Bio/Tools/Run/RemoteBlast.pm version 1.5 line 569-560
> #########################################################
> 		my $size = -s $tempfile;
> 		if( $size > 1000 ) {
> #########################################################
> 
> So I made it to print the file size and had run my perl script several
> times
> 
> #########################################################
> 		my $size = -s $tempfile;
> 		print "Size of temporary file from RemoteBlast.pm $size\n";
> 		if( $size > 1000 ) {
> #########################################################
> 
> Each time I did so, I was getting the file size value of 2014 to 2017
> and no wonder it satisfies the condition ($size > 1000) even when the
> results were not ready.
> 
> So I modified the condition to the following
> #########################################################
> 		my $size = -s $tempfile;
> 		if( $size > 2017 ) {
> #########################################################
> 
> and there it goes, the code behaved itself and waited until the results
> were ready to proceed further with saving the output.
> This may be a result of some changes the NCBI admin would have made to
> the results status page which would have increased the file size and
> satisfying the condition to return an object which must be returned only
> when the results were ready.
> I am not sure whether this is the right answer to the problem but it
> does definitely work.
> Any comments from people having similar problem will be useful. I will
> see how long does this solution would work and knock back on your doors
> if I need further help.
> Thanks for your help.
> Regards
> Nagesh
> 
> 
> On Wed, 2006-01-18 at 22:15 -0700, Barry Moore wrote:
> > Nagesh,
> >
> > That does sound odd.  What version of bioperl are you using?  I'm
> > guessing 1.4?  If the answer is anything but 1.5 something, then I
> > suggest you should upgrade before going any further.  You will also
> > want to follow the current thread by about parsing XML formatted
> > blast reports.  I don't think this is your problem right now, but
> > eventually you'll have a problem if you aren't parsing XML format as
> > discussed in that post.  I've added some more detail below if you are
> > having the problem with 1.5 try some debugging.
> >
> > Here's what's going on (or should be going on) in your script, and
> > some suggestions for using the debugger.
> >
> > #This next line hits the NCBI server, and if it gets a blast report
> > in return parses it, and returns a Bio::Tools::Blast object.  If
> > there was no report you get 0, and if there was an error you get -1.
> >
> >      my $rc = $factory->retrieve_blast($rid);
> >
> >      print "RC $rc\n";
> >
> > #This if statement is checking to see if the server has NOT returned
> > a report yet.  If it did then $rc should be an object and ref $rc
> > will return 'Bio::SearchIIO::blast'.  If $rc is not an object (i.e.
> > you got no report) then ref $rc returns undef.
> >                  if( !ref($rc) )
> >                  {
> > #If you got here then you got no report from NCBI server yet, and so
> > the next if check is you got -1 meaning there was an error.  On error
> > delete this RID cause it's no good.
> >                          if( $rc < 0 )
> >                          {
> >      				$factory->remove_rid($rid);
> >                          }
> > #Print a dot on the screen in leu of music to keep the user
> > entertained while they wait.
> >                          print STDERR "." if ( $v > 0 );
> > #Take a nap so you don't piss off NCBI sys admin!
> >                          sleep 5;
> >      }
> > #Getting here means that $rc was an object, so we've got a report.
> > Go ahead and save it.
> >                  else
> >                  {
> >      sleep 600;
> > #Obviously writing your output file.
> >      $factory->save_output('temp.out');
> >      my $checkinput = $factory->file;
> >                      open(my $fh,"<$checkinput") or die $!;
> >                      while(<$fh>)
> > {
> >                               print;
> >                          }
> >                           close $fh;
> >      $factory->remove_rid($rid);
> >
> >
> > run your script in the debugger like this:
> >
> > perl -d your_script.pl
> >
> > Step forward one line at a time by typing 'n'.
> > When you get just past my $rc = $factory->retrieve_blast($rid); type
> > 'x $rc'
> > You should get 0, -1 or 'Bio::SearchIO::blast'
> > Keep stepping forward with 'n'.
> > If you get 0 you should loop back to retrieve_blast after a sleep.
> > If you get -1 you should end your script - you got an error (What was
> > it?)
> > If you get an Bio::SearchIO::blast object then you should be writing
> > a temp.out
> >
> > Barry
> >
> >
> > On Jan 18, 2006, at 6:37 PM, Nagesh wrote:
> >
> > > Thanks very much to all specially to Barry and Hubert for their
> > > time in
> > > answering my query. Some updates into my problem.
> > >
> > > I have performed some diagnostics tests and writing below my
> > > observations.
> > >
> > > First of all, the problem in the code was that it was not waiting for
> > > the results to be ready for writing it to the output file. So I wanted
> > > to check whether the condition "if( !ref($rc) )" is ever satisfied
> > > and I
> > > printed out the $rc value which was some thing like "Bio::SearchIO::
> > > blast=HASH(0x9010370)". When I had looked at the Bioperl documentation
> > > for RemoteBlast.pm, the value for $rc in "$rc = $factory-
> > > >retrieve_blast
> > > ($rid);" should either return 0 or 1. I am not able to understand
> > > whether what I am getting is right.
> > >
> > > Secondly, I had manually forced the script to wait between
> > > submit_blast,
> > > retrieve_blast and save_output by using sleep with values ranging from
> > > 30 to 600. None of them where successful in saving the output.
> > >
> > > When sleep (600) is between submit_blast and retrieve_blast, the
> > > following is printed onto std output (shown below is part of the
> > > output)
> > > with output file still empty.
> > >
> > > <P><table>
> > > <tr><td>Request ID</td><td> <b>1137626804-16566-100302560340.BLASTQ4</
> > > b></td></tr>
> > > <tr><td>Status</td><td>Searching</td></tr>
> > > <tr><td>Submitted at</td><td>Wed Jan 18 18:26:44 2006</td></tr>
> > > <tr><td>Current time</td><td>Wed Jan 18 18:36:46 2006</td></tr>
> > > <tr><td>Time since submission</td>
> > > <td>00:10:01</td>
> > > </tr><P></table>
> > > <p><hr>This page will be automatically updated in <b>10</b> seconds
> > > until search is done<BR>
> > >
> > > When sleep (600) is between retrieve_blast and save_output, the
> > > following is printed with nothing written to output file.
> > >
> > > <P><table>
> > > <tr><td>Request ID</td><td> <b>1137632221-28820-85178967709.BLASTQ1</
> > > b></td></tr>
> > > <tr><td>Status</td><td>Searching</td></tr>
> > > <tr><td>Submitted at</td><td>Wed Jan 18 19:57:01 2006</td></tr>
> > > <tr><td>Current time</td><td>Wed Jan 18 19:57:03 2006</td></tr>
> > > <tr><td>Time since submission</td>
> > > <td>00:00:01</td>
> > > </tr><P></table>
> > > <p><hr>This page will be automatically updated in <b>10</b> seconds
> > > until search is done<BR>
> > >
> > > Please note the difference in time since submission.
> > >
> > > Lastly, I had printed out the request ID and manually paused the
> > > script
> > > by using <STDIN> between submit_blast and retrieve_blast. The idea was
> > > to check the status of the job online through the NCBI website.
> > > When the
> > > results where ready, I made the script to proceed further and was able
> > > to save the desired results to the file. I am puzzled with this
> > > observation as I am not understanding why manually formating the
> > > results
> > > online helps in getting the results.
> > > I am basically a molecular biologist and trying hard to solve this
> > > computational stuff, so there might be some trivial issues
> > > according to
> > > you computer wiz :)
> > >
> > > Barry suggested me to use perl debugger which I will try to use.
> > >
> > > Thanks for your attention.
> > >
> > > Below is the code which was being tested.
> > >
> > > ######################################################################
> > > ##
> > >
> > > use strict;
> > > use warnings;
> > > use Bio::Tools::Run::RemoteBlast;
> > >
> > > print "$Bio::Root::Version::VERSION\n";
> > > my $prog = 'blastp';
> > > my $db   = 'swissprot';
> > > my $e_val= '1e-10';
> > >
> > > my @params = ( '-prog' => $prog,
> > >        '-data' => $db,
> > >        '-expect' => $e_val,
> > >        '-readmethod' => 'SearchIO' );
> > >
> > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> > >
> > > #change a paramter
> > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens
> > > [ORGN]';
> > >
> > > #remove a parameter
> > > delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'};
> > >
> > > my $v = 1;
> > > #$v is just to turn on and off the messages
> > >
> > > my $r = $factory->submit_blast('blastInput.txt');
> > >
> > > print STDERR "waiting..." if( $v > 0 );
> > > while ( my @rids = $factory->each_rid )
> > > {
> > >         foreach my $rid ( @rids )
> > >         {
> > >
> > >     print "RID $rid\n";
> > >
> > >     #<STDIN>;
> > >     #sleep 600;
> > >     my $rc = $factory->retrieve_blast($rid);
> > >
> > >     print "RC $rc\n";
> > >                 if( !ref($rc) )
> > >                 {
> > >                         if( $rc < 0 )
> > >                         {
> > >     				$factory->remove_rid($rid);
> > >                         }
> > >                         print STDERR "." if ( $v > 0 );
> > >                         sleep 5;
> > >     }
> > >                 else
> > >                 {
> > >     sleep 600;
> > >     $factory->save_output('temp.out');
> > >     my $checkinput = $factory->file;
> > >                     open(my $fh,"<$checkinput") or die $!;
> > >                     while(<$fh>)
> > > {
> > >                              print;
> > >                         }
> > >                          close $fh;
> > >     $factory->remove_rid($rid);
> > >                 }
> > >         }
> > > }
> > >
> > > ######################################################################
> > > ##
> > >
> > >
> > > On Tue, 2006-01-17 at 16:03 -0700, Barry Moore wrote:
> > >> Nagesh,
> > >>
> > >> Attached is an input file, script and output.  These work for me,
> > >> and I
> > >> think they are the same that you are using.  Have a look and see
> > >> if you
> > >> can find any differences that might be causing you problem.  Other
> > >> than
> > >> that I don't know what to tell you.  If you are familiar with the
> > >> perl
> > >> debugger you (and if you're not, now's probably a good time to become
> > >> familiar with it) you should step through you script and be sure that
> > >> all of you're objects are getting defined when they are supposed
> > >> to be.
> > >> That can often help narrow down the problem.
> > >>
> > >> Barry
> > >>
> > >>> -----Original Message-----
> > >>> From: Nagesh Chakka [mailto:nagesh.chakka at anu.edu.au]
> > >>> Sent: Tuesday, January 17, 2006 1:57 PM
> > >>> To: Barry Moore
> > >>> Cc: Hubert Prielinger; bioperl-l at bioperl.org
> > >>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm
> > >>>
> > >>> Bi  Barry,
> > >>> With the help of Hubert, I further modified the script but still
> > >>> have
> > >> the
> > >>> same
> > >>> problem. The problem is that from the point of submitting the blast
> > >> query,
> > >>> the script does not wait until the blast results are ready  for
> > >> retrieval
> > >>> and
> > >>> event of submission is immediately followed by retrieving and saving
> > >> the
> > >>> output. Since the results will not be ready (about a sec) this fast,
> > >> the
> > >>> output created is blank. I am able to retrieve the results online
> > >> using
> > >>> the
> > >>> RID which I am making the script to print.
> > >>> So  my main problem is making the program to wait after
> > >>> submitting the
> > >>> result.
> > >>> My input file has a single fasta sequence which I have pasted below.
> > >>> Its interesting to note that the script works on your system. Is it
> > >>> creating
> > >>> an output file with the blast report?
> > >>> Thanks very much for your attention.
> > >>> Regards
> > >>> Nagesh
> > >>>
> > >>> blastInput.txt
> > >>>> MusDpl
> > >>>
> > >> MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFIKQGRKLDI
> > >> DFG
> > >> AE
> > >>> GNRYYA
> > >>>
> > >> ANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEFSREKQDSKLHQRVLWRLIKEICSAKHCD
> > >> FWL
> > >> ER
> > >>> GAAL
> > >>> RVAVDQPAMVCLLGFVWFIVK
> > >>>
> > >>> On Wednesday 18 January 2006 05:34, Barry Moore wrote:
> > >>>> Nagesh-
> > >>>>
> > >>>> Did you get this figured out?  Your script works as is on my
> > >>>> system.
> > >>>> You say temp.out is empty?  What does you input sequence
> > >>>> (blastInput.txt) look like?
> > >>>>
> > >>>> Barry
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: bioperl-l-bounces at portal.open-bio.org [mailto:bioperl-l-
> > >>>>> bounces at portal.open-bio.org] On Behalf Of Hubert Prielinger
> > >>>>> Sent: Monday, January 16, 2006 2:54 PM
> > >>>>> To: Nagesh Chakka; bioperl-l at portal.open-bio.org
> > >>>>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm
> > >>>>>
> > >>>>> Nagesh Chakka wrote:
> > >>>>>> Hi All,
> > >>>>>> I was trying to setup a system to perform a remote blast on
> > >> regular
> > >>>>>
> > >>>>> basis. I
> > >>>>>
> > >>>>>> thought this could be best achieved by using BioPerl module and
> > >> came
> > >>>>>
> > >>>>> across
> > >>>>>
> > >>>>>> RemoteBlast.pm
> > >>>>>> I had modified the sample script "bp_remote_blast.pl" which takes
> > >> a
> > >>>>
> > >>>> file
> > >>>>
> > >>>>>> containing single FASTA sequence as an input. Also I wanted the
> > >> blast
> > >>>>>
> > >>>>> report
> > >>>>>
> > >>>>>> to be saved in a file for latter use and
> > >>>>>> modified the code as follows
> > >>>>>> I am using the latest version of Bioperl (1.5) on a Fedora
> > >> platform.
> > >>>>>
> > >>>>
> > >>> ####################################################################
> > >>> ###
> > >>>>>
> > >>>>>> print "$Bio::Root::Version::VERSION\n";
> > >>>>>> use Bio::Tools::Run::RemoteBlast;
> > >>>>>> use strict;
> > >>>>>> my $prog = 'blastp';
> > >>>>>> my $db   = 'swissprot';
> > >>>>>> my $e_val= '1e-10';
> > >>>>>>
> > >>>>>> my @params = ( '-prog' => $prog,
> > >>>>>>       '-data' => $db,
> > >>>>>>       '-expect' => $e_val,
> > >>>>>>       '-readmethod' => 'SearchIO' );
> > >>>>>>
> > >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>>
> > >>>>>> #change a paramter
> > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo
> > >> sapiens
> > >>>>>> [ORGN]';
> > >>>>>>
> > >>>>>> #remove a parameter
> > >>>>>> delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'};
> > >>>>>>
> > >>>>>> my $v = 1;
> > >>>>>> #$v is just to turn on and off the messages
> > >>>>>>
> > >>>>>> my $r = $factory->submit_blast('blastInput.txt');
> > >>>>>>
> > >>>>>> print STDERR "waiting..." if( $v > 0 );
> > >>>>>> while ( my @rids = $factory->each_rid )
> > >>>>>> {
> > >>>>>>        foreach my $rid ( @rids )
> > >>>>>>        {
> > >>>>>>                my $rc = $factory->retrieve_blast($rid);
> > >>>>>>                if( !ref($rc) )
> > >>>>>>                {
> > >>>>>>                        if( $rc < 0 )
> > >>>>>>                        {
> > >>>>>>                                $factory->remove_rid($rid);
> > >>>>>>                        }
> > >>>>>>                        print STDERR "." if ( $v > 0 );
> > >>>>>>                        sleep 5;
> > >>>>>>                }
> > >>>>>>                else
> > >>>>>>                {
> > >>>>>>                        print "RID $rid\n";
> > >>>>>>                        $factory->save_output('temp.out');
> > >>>>>>                        $factory->remove_rid($rid);
> > >>>>>>                }
> > >>>>>>        }
> > >>>>>> }
> > >>>>>
> > >>>>
> > >>> ####################################################################
> > >>> ###
> > >>>>
> > >>>> ##
> > >>>>
> > >>>>> ########
> > >>>>>
> > >>>>>> This script prints the RID and terminates immediately. Obviously
> > >> the
> > >>>>>> output file created is empty as the program did not wait for
> > >> getting
> > >>>>
> > >>>> the
> > >>>>
> > >>>>>> blast results from the RID.
> > >>>>>> Is there something I am doing wrong and what can I do for the
> > >> program
> > >>>>
> > >>>> to
> > >>>>
> > >>>>> wait
> > >>>>>
> > >>>>>> until the results are ready to be printed to the output file. I
> > >> could
> > >>>>
> > >>>> not
> > >>>>
> > >>>>> get
> > >>>>>
> > >>>>>> much information from the documentation and have no prior
> > >> experience
> > >>>>
> > >>>> with
> > >>>>
> > >>>>>> Bioperl.
> > >>>>>> Thanks very much for  your attention.
> > >>>>>> Regards
> > >>>>>> Nageshbi
> > >>>>>> _______________________________________________
> > >>>>>> Bioperl-l mailing list
> > >>>>>> Bioperl-l at portal.open-bio.org
> > >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>
> > >>>>> hi nagesh,
> > >>>>> try this, should work, I had the same problem:
> > >>>>>
> > >>>>> .......................
> > >>>>> .......................
> > >>>>>
> > >>>>> else
> > >>>>>                 {
> > >>>>>                         print "RID $rid\n";
> > >>>>>                         $factory->save_output('temp.out');
> > >>>>>
> > >>>>> 			my $checkinput = $factory->file;
> > >>>>>               		open(my $fh,"<$checkinput") or die
$!;
> > >>>>>               		while(<$fh>){
> > >>>>>                 		print;
> > >>>>>               		}
> > >>>>>               		close $fh;
> > >>>>>
> > >>>>>
> > >>>>> 			$factory->remove_rid($rid);
> > >>>>>                 }
> > >>>>>         }
> > >>>>> }
> > >>>>>
> > >>>>> regards
> > >>>>> Hubert
> > >>>>>
> > >>>>> PS: are you using the composition based statistics parameter with
> > >> your
> > >>>>> blast search?
> > >>>>> if yes, is it working?
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Bioperl-l mailing list
> > >>>>> Bioperl-l at portal.open-bio.org
> > >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list