[Bioperl-l] Easy switching from wwwBlast to QBlast

Madeleine Lemieux mlemieux at bioinfo.ca
Wed Nov 24 04:22:26 EST 2004


I've just recently started exploring BioPerl (v.1.4). So far it's been 
fun if a little daunting.

As an exercise, I decided to try change the blast_sequence subroutine 
in Perl.pm so that it would let me send the query to either my local 
wwwBlast server or out over my slow, flakey internet connection to the 
QBlast server. I did this by adding a parameter LOCALSERVER which, if 
set to a URL, redirects the query to that server (e.g. LOCALSERVER => 
http://localhost/blast/blast.cgi); otherwise, it defaults to the server 
at the NCBI.

I've also added support for query by accession or gi # (QBlast only 
since wwwBlast doesn't support such queries), submission of multiple 
sequences (either in a file or string or string variable), as well as 
passing any of the QBlast Put and Get options as parameters. Unlike the 
original one, my blast_sequence returns an array of results, not a 
single result, so that code calling my version of blast_sequence in a 
scalar context would incorrectly get the size of the array.

Apart from Perl.pm, the only other file that I had to change was 
Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release 
candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed in 
ways that overlap with the changes I've made while maintaining 
backwards compatibility which my version does not since I was only 
working for myself at the time.

So my question is: is anyone interested in getting the code I've 
developed? If so, a corollary question is: how do I go about 
contributing the code? I can pretty easily forward port my changes to 
RemoteBlast.pm to the 1.5.RC1 version in order to use the nice 
"validate by regexp" trick introduced there and to provide backwards 
compatibility. I'm not sure what to do about the Perl.pm module, 
though. I guess that the easiest would be to change the name of my 
blast_sequence subroutine and add it to Perl.pm since there is no 
object interface being altered.

As I was working on this, I noticed that the HTML stripping that gets 
done on the response from the QBlast server fails on wwwBlast output 
since the format of the HTML is a little different (manifests as a 
"can't find mid-line data" error when processing the alignments). So I 
wrote a generic stripper which removes all HTML tags except those that 
contain an end-of-line within the tag itself or an internal, un-escaped 
closing angle bracket (>) which wouldn't be valid HTML anyway, I think. 
It doesn't touch single angle brackets (>) such as those found at the 
beginning of descriptions (>gi ...).
	# html stripper
	# remove simple and closing tags first and then leftover tags
	$str =~ s/<(\/)?\w+>//g;
	$str =~ s/<\D+([^>]*\n*)*>//g;

Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test 
for completion relies on the size of the file containing the reply. 
This has failed at least once for me. Since there is a status line near 
the top of the file in the response, it seems to me that something 
along the lines of the following might be more robust:
	# read file until QBlastInfoEnd to pull out status
	my $status = '';
	my $junk = '';
	open(TMP, $tempfile) or $self->throw("cannot open $tempfile");
      while( defined (my $line = <TMP>) ) {
          last if ($line =~ /QBlastInfoEnd/);
          ($junk, $status) = (split /=/, $line) if ($line =~ 
/waiting|ready/i);
      }
      close TMP;

      if( $response->is_success ) {
		if ( $status =~ /waiting/i ) {
              return 0;
           } elsif ( $status =~ /ready/i ) {
		    ...
	     } else { # failed
		    ...
		}
	} ...

Finally, let me end by thanking all the BioPerl contributors for their 
fine work.

Regards,
Madeleine



More information about the Bioperl-l mailing list