[Bioperl-l] BLAST parameters

P B itatsumaki@hotmail.com
Sat, 10 Aug 2002 15:29:12 +0000


Hi,

Thanks for bringing up the point.  I am trying to compensate for the load 
by:
a) sleeping 60 seconds between submissions
b) sleeping 15 seconds between checks for a rid
c) retrieving a minimal amount of information (i.e. no alignments, just 10 
hits)

As for the possibility of malfunctions/loss of data... well, my hope is that 
there won't be any and that my code will be robust enough to handle it. If 
not, I can always restart from where I left off.

My understanding is that local blast won't be feasible on my machine -- 128 
MB and a PII-400, and 4 GB HD.  The documentation from NCBI indicates much 
more memory will be required, and we would be tight on HD space as well, I 
think.

Does anybody else feel the same way?  I could break this up into chunks of 
500/day and do it over a month if there are serious concerns....

Tats

>Hi,
>
>I do not know the throughput of the NCBI server and the network.
>Since Brian and Jason did not have a comment on this point, it may be
>completely all right, but I personally would feel it a kind of abuse.
>Not mentioning that I would worry about the 10 days run without
>problems due to malfunctions, loss of data and such.
>Is it not possible (or simply more comfortable) to download,
>establish and use the database locally? It would surely be completed
>faster than 10 days, but of course a (not extremely) reasonable
>processor and some storage capacity is necessary for that. But in
>return you'd be ready in a day altogether.
>Whereas for the 15000 RemoreBlasts you may need luck and a lot of
>patience.
>
>I really love the idea of the RemoteBlast, as it saves lots of people
>from tremendous agony, but this magnitude really seems to me a
>challenge.
>Or is this exactly for what RemoteBlast was created, rather than some
>( or some dozens or some hundreds of ) sequences? I really do not
>know. I just think loud.
>
>Any way, have fun
>Peter
>
> > I haven't used bioperl before, so some of these questions might be
>a
> > little
> > dumb, so flame away where needed.  Let me first give the goal, in
> > case I'm
> > missing something conceptual here:
> >
> > Goal:
> > I have a long list of sequences (15,000) that I would like to
> > identify.  In
> > particular, I want to find out what (rat) cluster they most likely
> > represent.
> >
> > Approach:
> > - submit genes one by one to remote BLAST (it's a lot of BLASTing
>so
> > I'm
> > waiting 60 seconds between submissions (I do realize this will take
> > 10 days,
> > btw, and I don't have access to a local BLAST)
> > - retrieve the BLAST results and parse out the top ten hits by e-
> > value or
> > bit-score (undecided if there is a reason to prefer expectation
> > values to
> > the normalized bit-scores?)
> > - for each of the top 10 hits, parse out the genbank accession
> > - use this accession to determine the corresponding cluster (I
> > expect I will
> > have to download the unigene .dat file to do this)
> > - if I can assign a conclusive identity to the sequence, great, if
> > not store
> > the results for future analysis
> >
> > I hope to be able to automatically identify 70-80% of the sequences
> > using
> > selection criteria like:
> > 2 top hits for same cluster
> > 3 of the top 5 hits for same cluster
> > 6 of the top 10 hits for same cluster
> > or something similar.  The assignations don't have to be perfect,
> > just
> > reasonably close.
> >
> > Now, my (first) two problems involve submitting the BLAST to NCBI.
> >  I'm
> > doing a test case with a 3-sequence FASTA file, btw.  What I would
> > like is
> > to restrict my BLAST searches to "Rattus norvegicus" as you can on
> > the NCBI
> > web-site under advanced options.
> >
> > In addition, I would like to be able to submit customized
>nucleotide
> >
> > substitution matrices to use with the BLAST.
> >
> > That latter point isn't as critical, but I really would like to
> > avoid having
> > to get back a pile of BLAST hits and have to filter through non-rat
> > hits if
> > possible.
> >
> > The RemoteBlast module accepts an @params array array to its
>->new()
> > method,
> > but I don't know what to call these parameters that I would like to
> > use.
> >
> > Any comments, suggestions, ideas are very much welcome.
> > Thanks in advance!
> > Tats
> >
> > _________________________________________________________________
> > Send and receive Hotmail on your mobile device:
> > http://mobile.msn.com
> >
>
>
>
>..................................................................
>..........
>Peter B. Kos, Ph.D.
>Molecular Microbiology and Genetics Lab.
>Research Institute of Innovative Technology for the Earth (RITE)
>9-2 Kizugawadai, Kizu-cho, Soraku-gun,
>Kyoto 619-0292 JAPAN
>Phone: +81-774-75-2308
>Fax: +81-774-75-2321
>E-mail: kos@rite.or.jp
>




_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com