[Biopython] Virus alert during qblast()

David Shin davidsshin at lbl.gov
Wed Apr 23 07:38:55 UTC 2014


For standalone, which yes, will run way way way faster, this is what I did
to make a few filtered databases. Tried to give examples of nucleotide if
that's what you are looking for...

Go to the nucleotide or protein (whichever you are working on) BLAST page
Nucleotide BLAST: Search nucleotide databases using a nucleotide
query<http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome>
and start typing in the organism or species in the text field corresponding
to

"Organism"
"optional"

Get the taxid i.e if you typed in:
bacteria, you would get taxid:2
if you put zea mays you would get taxid:4577

then go to the NCBI nucleotide or page
Home - Nucleotide - NCBI <http://www.ncbi.nlm.nih.gov/nuccore/>
Use the following syntax for your search (will use the zea mays example)
txid4577[ORGN]

Then, from the "send to:" pulldown on the webpage:
click "file" button
a dropdown will appear
under format, select "gi list"

save the file.... but change name to something you remember like
sequence.gi.4577.txt
in case you will want different filters later

Then in your database directory where you have downloaded the all nr
nucleotide database
run:
blastdb_aliastool -gilist sequence.gi.4577.txt -db nr -out nr_gi.4577
-title nr_gi.4577
to give a filter called nr_gi.4577

then when you blast from your script, it would look something like:
blastn -query mysequence.fs -num_threads 4 -db nr_gi.4577 -out
test-4577.out

In my case, I made a filter that had just "plants", using taxid 3193, but
also a subset that had ~15 selected species, by combining the "gi list"
output from separate searches.. ie. like if I wanted a "bacteria + zea
mays" filter because I was psychotic, I would cat together the gi lists
files from txid2 and txid4577.

Anyway, that's how you can run everything locally after you have it set up,
and reduce time by a significant amount.

At least, that's how I did it, if anyone has a better way, let me know.

D



On Tue, Apr 22, 2014 at 10:44 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi again,
>
> Using standalone BLAST+ at the command line with -remote
> you can specify an Entrez filter option -entrez_query on the
> organism.
>
> Another option which may be better is to make a target
> database (e.g all fully sequenced bacteria).
>
> Peter
>
>
> On Wed, Apr 23, 2014 at 12:45 AM, Michael Fethe <mfethe1 at gmail.com> wrote:
> > Hi Peter,
> >
> > I am blasting unknowns, however, can I limit biopython to bacteria in my
> qblast command?
> >
> > Michael Fethe
> >
> >> On Apr 22, 2014, at 6:02 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >>
> >> Hi Michael,
> >>
> >> That seems unlikely - but if you are doing hundreds of
> >> automated BLAST queries, the NCBI might not be very
> >> happy.
> >>
> >> For big BLAST jobs, I would always use standalone
> >> BLAST running locally (on your cluster if possible).
> >> This is generally faster as well :)
> >>
> >> Regards,
> >>
> >> Peter
> >>
> >>> On Tue, Apr 22, 2014 at 10:59 PM, Michael Fethe <mfethe1 at gmail.com>
> wrote:
> >>> Hi,
> >>>
> >>> I am submitting sequences to blast via biopython. My script
> >>> runs over multiple hours and can take quite some time
> >>> (working with hundreds of sequences). Is it possible for
> >>> my computer or someone to mistake this script running
> >>> as a virus since it writes my blast results to an output file
> >>> and then submits my next sequence?
> >>>
> >>> Thanks,
> >>>
> >>> Michael Fethe
> >>> _______________________________________________
> >>> Biopython mailing list  -  Biopython at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biopython
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



-- 
David Shin, Ph.D
Lawrence Berkeley National Labs
1 Cyclotron Road
MS 83-R0101
Berkeley, CA 94720
USA



More information about the Biopython mailing list