[Bioperl-l] BLAST parameters

Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
Fri, 9 Aug 2002 23:10:35 -0500


 Hi,

 I suppose I should reply to this, since I am indeed trying to figure out how to get a local remote blast set up.  I mentioned it at BOSC and of course it was mentioned that someone should the possiblitity to change the URL that a remote blast points to.  Sounds easy, but isn't quite.

The paramters for the pages that ncbi provides for local remote blast (this sounds like an oxymoron) are of course different from the ncbi page paramters.  Most notably, the DATABASE parm and QUERY parm are changed to DATALIB and QUERY for the local pages, though there are other changes as well.  Also, ther are no RID's for local set up, so once a blast is fired, the curent program will wait till the blast returns, and then find no RID's in the output.  Not really a problem, but different behavior.  Makes it is harder to put up progress notices though (the sleep, whcih puts up a . every 5 seconds deal).  Long term, it is unclear if NCBI will be changing these as well, leading to a maintenance headache reagrding these paramters (both on their site, and what they release for a local setup).  No word on that yet.  

So, I am not sure what the best way is to make it work with a local setup, I have something that works, but need some design help if it is to be more generalized than what is there.  I can post more later, if anyone is interested.  Any use cases and criticisms are appreciated.

-Mat

-----Original Message-----
From: Jason Stajich
To: P B
Cc: brian_osborne@cognia.com; bioperl-l@bioperl.org
Sent: 8/9/02 2:35 PM
Subject: RE: [Bioperl-l] BLAST parameters

Yes I wrote RemoteBlast with HARD CODED global hash.
It was more of an exercise than a true flexible remote blasting module
because most systems are different.  We should really setup RemoteBlast
as
a front end factory and subclass it for different types of RemoteBlast
engines (NCBI, EBI, your-favorite-system, local-blast stuff).  The Hard
coding should be removed and made to be picking up defaults but allowing
everything to be set dynamically in the object.

Mat Wiepert @mayo is looking at this, if someone else wants to help this
should be reasonably easy to do in a more generally way if you are happy
to CGI parameter debug.

-jason

On Fri, 9 Aug 2002, P B wrote:

> Hi Brian,
>
> Thanks!  It will be great to know how to change parameters like this.
My
> one question is: what is that line of code
$Bio::...HEADER{'MATRIX_NAME'} =
> 'BLOSUM25' actually doing?  Is HEADER a hash in the RemoteBlast
name-space?
> It's not a crucial point, but I like knowing what's actually going on
as
> much as possible.
>
> Thanks again,
> Tats
>
> >From: "Brian Osborne" <brian_osborne@cognia.com>
> >To: "P B" <itatsumaki@hotmail.com>, <bioperl-l@bioperl.org>
> >Subject: RE: [Bioperl-l] BLAST parameters
> >Date: Fri, 9 Aug 2002 13:35:18 -0400
> >Return-Path: brian_osborne@cognia.com
> >X-OriginalArrivalTime: 09 Aug 2002 17:35:43.0969 (UTC)
> >
> >Tats,
> >
> >I just added this to bptutorial.pl, you might find it useful:
> >
> >You may want to change some parameter of the remote job and this
example
> >shows how to change the matrix:
> >
> >$Bio::Tools::Run::RemoteBlast::HEADER{'MATRIX_NAME'} = 'BLOSUM25';
> >
> >For a description of the many CGI parameters see:
> >
> >http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html
> >
> >
> >Brian O.
> >
> >
> >-----Original Message-----
> >From: bioperl-l-admin@bioperl.org
[mailto:bioperl-l-admin@bioperl.org]On
> >Behalf Of P B
> >Sent: Friday, August 09, 2002 1:14 PM
> >To: bioperl-l@bioperl.org
> >Subject: [Bioperl-l] BLAST parameters
> >
> >Hi all, a newbie question I think.
> >
> >I haven't used bioperl before, so some of these questions might be a
little
> >dumb, so flame away where needed.  Let me first give the goal, in
case I'm
> >missing something conceptual here:
> >
> >Goal:
> >I have a long list of sequences (15,000) that I would like to
identify.  In
> >particular, I want to find out what (rat) cluster they most likely
> >represent.
> >
> >Approach:
> >- submit genes one by one to remote BLAST (it's a lot of BLASTing so
I'm
> >waiting 60 seconds between submissions (I do realize this will take
10
> >days,
> >btw, and I don't have access to a local BLAST)
> >- retrieve the BLAST results and parse out the top ten hits by
e-value or
> >bit-score (undecided if there is a reason to prefer expectation
values to
> >the normalized bit-scores?)
> >- for each of the top 10 hits, parse out the genbank accession
> >- use this accession to determine the corresponding cluster (I expect
I
> >will
> >have to download the unigene .dat file to do this)
> >- if I can assign a conclusive identity to the sequence, great, if
not
> >store
> >the results for future analysis
> >
> >I hope to be able to automatically identify 70-80% of the sequences
using
> >selection criteria like:
> >2 top hits for same cluster
> >3 of the top 5 hits for same cluster
> >6 of the top 10 hits for same cluster
> >or something similar.  The assignations don't have to be perfect,
just
> >reasonably close.
> >
> >Now, my (first) two problems involve submitting the BLAST to NCBI.
I'm
> >doing a test case with a 3-sequence FASTA file, btw.  What I would
like is
> >to restrict my BLAST searches to "Rattus norvegicus" as you can on
the NCBI
> >web-site under advanced options.
> >
> >In addition, I would like to be able to submit customized nucleotide
> >substitution matrices to use with the BLAST.
> >
> >That latter point isn't as critical, but I really would like to avoid
> >having
> >to get back a pile of BLAST hits and have to filter through non-rat
hits if
> >possible.
> >
> >The RemoteBlast module accepts an @params array array to its ->new()
> >method,
> >but I don't know what to call these parameters that I would like to
use.
> >
> >Any comments, suggestions, ideas are very much welcome.
> >Thanks in advance!
> >Tats
>
>
> _________________________________________________________________
> MSN Photos is the easiest way to share and print your photos:
> http://photos.msn.com/support/worldwide.aspx
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l