[Bioperl-l] RemoteBlast.pm getting RID requests-make/alter themethod?

Chris Fields cjfields at uiuc.edu
Mon Feb 6 17:27:56 UTC 2006


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson
> Sent: Friday, February 03, 2006 2:54 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
> themethod?
> 
> I have been working with the RemoteBlast.pm module and have found that it
> is
> a bit clunky to use loops to keep checking to see if you RID has finished.
> 
> 
> 
> For example, every time you write a script, you need to add a code block
> (see example in the documentation) in order to keep checking if @rid is
> finished.
> 
> Would it be better to maybe write this in as a method in the RemoteBlast
> module?  It seems like it would be better for remoteblast to have a method
> we could call say retrieve_when_done that would return the blast report
> when
> the value of retrieve_blast is no longer 0.

Sounds reasonable, though I'm not sure how easy it would be to implement.
Why not drop by Bugzilla (http://bugzilla.bioperl.org/) and submit this as
an enhancement?

> The only issue may be report parsing, but I wonder if it might be better
> to
> separate out submittal/retrieval of BLAST requests from the parsing step
> and
> make these more discrete processes?  Since NCBI seems to be not supporting
> text results as a standard, maybe the module should work exclusively with
> XML and we could change report handling away from the headaches of text
> processing and just allow Bio::SeqIO or blastxml handle the task of making
> a
> blast reports into different forms (such as HTML, text etc).

They are separated.  RemoteBlast executes BLAST remotely (via HTTP).
Results are parsed via various Bio::SearchIO modules depending on what you
set '-readmethod' to.  This is from perldoc:

>From Bio::Tools::Run::RemoteBlast
________________________________________________________

DESCRIPTION
    Class for remote execution of the NCBI Blast via HTTP.

    For a description of the many CGI parameters see:
    http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

    Various additional options and input formats are available.

________________________________________________________

>From Bio::SearchIO____________
____________________________________________
DESCRIPTION
    This is a driver for instantiating a parser for report files from
    sequence database searches. This object serves as a wrapper for the
    format parsers in Bio::SearchIO::* - you should not need to ever use
    those format parsers directly. (For people used to the SeqIO system it,
    we are deliberately using the same pattern).

    Once you get a SearchIO object, calling next_result() gives you back a
    Bio::Search::Result::ResultI compliant object, which is an object that
    represents one Blast/Fasta/HMMER whatever report.

    A list of module names and formats is below:

      blast      BLAST (WUBLAST, NCBIBLAST,bl2seq)
      fasta      FASTA -m9 and -m0
      blasttable BLAST -m9 or -m8 output (NCBI not WUBLAST tabular)
      megablast  MEGABLAST
      psl        UCSC PSL format
      waba       WABA output
      axt        AXT format
      sim4       Sim4
      hmmer      HMMER hmmpfam and hmmsearch
      exonerate  Exonerate CIGAR and VULGAR format
      blastxml   NCBI BLAST XML
      wise       Genewise -genesf format

    See the SearchIO HOWTO linked from http://bioperl.org/HOWTOs/

________________________________________________________

This is also in the wiki online now:

http://www.bioperl.org/wiki/Module:Bio::SearchIO 
http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

I think the current line of thought is to make XML the default, but I also
know you would irritate a LOT of people out there by cutting off text output
parsing completely.  Roger Hall or Jason pointed out that doing so will
break many scripts out there.  

Furthermore, the problems with text output parsing are usually minimal.  For
instance, the last one was a small change which broke a regex, causing an
infinite loop; the actual bug was in Bio::SearchIO::blast and not in
RemoteBlast.  A simple addition to the regex fixed it.  The only change to
RemoteBlast was to implement the option of saving XML formatted BLAST
output.

I do like the idea of using XML output to build custom (bioperl-specific)
BLAST reports, but that also requires more work, likely a lot more work.
Again, maybe add that as an enhancement in Bugzilla or, better yet, submit
some sample code maybe as an example.  

> This would definitely simplifying coding using the RemoteBlast.pm module
> as
> then you could treat the report retrieval process as an object and just
> wait
> for the object to return its value, instead of coding in a bunch of test
> loops to see if it is done.  This may also help keep bugs out of the
> module
> and make the module longer lasting and not require module users to rewrite
> their code every time NCBI makes changes.

I think the most stable way of submitting jobs is by using the netblast
client (blastcl3) and parsing the results from that.  No CGI, no HTML, just
saving to a temp file and parsing through SearchIO.

RemoteBlast was designed, I believe, with the idea of letting researchers
with some basic knowledge of perl use an interface familiar to them (i.e.
the BLAST interface at NCBI) and retrieve results on a regular basis.  The
results are parsed via SearchIO::blast/blastxml/blasttable.  The problem is,
though convenient, RemoteBlast is also reliant on the powers that be at NCBI
not changing anything dramatically.  It is possible that NCBI could modify
the HTML code from the BLAST retrieval process, thus breaking RemoteBlast.
Text output could change again, even more dramatically, thus severely
breaking Bio::SearchIO::blast.  Thus, we adapt to those changes by modifying
the broken modules.  It's evolution at its finest.  It's also a fact of life
that code breaks and needs to be fixed every once in a while to stay
current.

Okay, I'm waxing philosophical now so I know I've definitely had too much
coffee.  Must get back to work...

> 
> 
> 
> Any thoughts or ideas?
> 
> 
> 
> Is anyone working on this?
> 
> 
> 
> Thanks
> 
> 
> 
> Brad Olson
> 
> 
> 
> 
> 
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign




More information about the Bioperl-l mailing list