[Bioperl-l] Remote BLASTing (was BLAST parameters)

Jason Stajich jason@cgt.mc.duke.edu
Fri, 9 Aug 2002 15:42:11 -0400 (EDT)


oh yeah - and it should be renamed WebBlast. If someone writes a server
which does not have a cgi interface but allows blast
submissions/retrievals (CORBA, decode the NCBI blastcl3 code, etc...)
then we should be able to talk to it as well.

So refactoring is in order if people really want to expand the generality
of this object (which I think is a good idea and will help advise on).

-jason

On Fri, 9 Aug 2002, Jason Stajich wrote:

> Yes I wrote RemoteBlast with HARD CODED global hash.
> It was more of an exercise than a true flexible remote blasting module
> because most systems are different.  We should really setup RemoteBlast as
> a front end factory and subclass it for different types of RemoteBlast
> engines (NCBI, EBI, your-favorite-system, local-blast stuff).  The Hard
> coding should be removed and made to be picking up defaults but allowing
> everything to be set dynamically in the object.
>
> Mat Wiepert @mayo is looking at this, if someone else wants to help this
> should be reasonably easy to do in a more generally way if you are happy
> to CGI parameter debug.
>
> -jason
>
> On Fri, 9 Aug 2002, P B wrote:
>
> > Hi Brian,
> >
> > Thanks!  It will be great to know how to change parameters like this.  My
> > one question is: what is that line of code $Bio::...HEADER{'MATRIX_NAME'} =
> > 'BLOSUM25' actually doing?  Is HEADER a hash in the RemoteBlast name-space?
> > It's not a crucial point, but I like knowing what's actually going on as
> > much as possible.
> >
> > Thanks again,
> > Tats
> >
> > >From: "Brian Osborne" <brian_osborne@cognia.com>
> > >To: "P B" <itatsumaki@hotmail.com>, <bioperl-l@bioperl.org>
> > >Subject: RE: [Bioperl-l] BLAST parameters
> > >Date: Fri, 9 Aug 2002 13:35:18 -0400
> > >Return-Path: brian_osborne@cognia.com
> > >X-OriginalArrivalTime: 09 Aug 2002 17:35:43.0969 (UTC)
> > >
> > >Tats,
> > >
> > >I just added this to bptutorial.pl, you might find it useful:
> > >
> > >You may want to change some parameter of the remote job and this example
> > >shows how to change the matrix:
> > >
> > >$Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} = 'BLOSUM25';
> > >
> > >For a description of the many CGI parameters see:
> > >
> > >http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
> > >
> > >
> > >Brian O.
> > >
> > >
> > >-----Original Message-----
> > >From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
> > >Behalf Of P B
> > >Sent: Friday, August 09, 2002 1:14 PM
> > >To: bioperl-l@bioperl.org
> > >Subject: [Bioperl-l] BLAST parameters
> > >
> > >Hi all, a newbie question I think.
> > >
> > >I haven't used bioperl before, so some of these questions might be a little
> > >dumb, so flame away where needed.  Let me first give the goal, in case I'm
> > >missing something conceptual here:
> > >
> > >Goal:
> > >I have a long list of sequences (15,000) that I would like to identify.  In
> > >particular, I want to find out what (rat) cluster they most likely
> > >represent.
> > >
> > >Approach:
> > >- submit genes one by one to remote BLAST (it's a lot of BLASTing so I'm
> > >waiting 60 seconds between submissions (I do realize this will take 10
> > >days,
> > >btw, and I don't have access to a local BLAST)
> > >- retrieve the BLAST results and parse out the top ten hits by e-value or
> > >bit-score (undecided if there is a reason to prefer expectation values to
> > >the normalized bit-scores?)
> > >- for each of the top 10 hits, parse out the genbank accession
> > >- use this accession to determine the corresponding cluster (I expect I
> > >will
> > >have to download the unigene .dat file to do this)
> > >- if I can assign a conclusive identity to the sequence, great, if not
> > >store
> > >the results for future analysis
> > >
> > >I hope to be able to automatically identify 70-80% of the sequences using
> > >selection criteria like:
> > >2 top hits for same cluster
> > >3 of the top 5 hits for same cluster
> > >6 of the top 10 hits for same cluster
> > >or something similar.  The assignations don't have to be perfect, just
> > >reasonably close.
> > >
> > >Now, my (first) two problems involve submitting the BLAST to NCBI.  I'm
> > >doing a test case with a 3-sequence FASTA file, btw.  What I would like is
> > >to restrict my BLAST searches to "Rattus norvegicus" as you can on the NCBI
> > >web-site under advanced options.
> > >
> > >In addition, I would like to be able to submit customized nucleotide
> > >substitution matrices to use with the BLAST.
> > >
> > >That latter point isn't as critical, but I really would like to avoid
> > >having
> > >to get back a pile of BLAST hits and have to filter through non-rat hits if
> > >possible.
> > >
> > >The RemoteBlast module accepts an @params array array to its ->new()
> > >method,
> > >but I don't know what to call these parameters that I would like to use.
> > >
> > >Any comments, suggestions, ideas are very much welcome.
> > >Thanks in advance!
> > >Tats
> >
> >
> > _________________________________________________________________
> > MSN Photos is the easiest way to share and print your photos:
> > http://photos.msn.com/support/worldwide.aspx
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu