[Biopython] getting multiple BLAST (NCBIWWW) queries to work
Peter Cock
p.j.a.cock at googlemail.com
Tue Mar 29 14:07:47 EDT 2011
On Tue, Mar 29, 2011 at 6:55 PM, James Wagner <jamesrwagner at gmail.com> wrote:
> Hello:
>
> I was trying just as a proof of concept to do an NCBI WWW BLAST query
> with a FASTA file containing more than one sequence (but still a small
> number of sequences).
>
> I tried with the opuntia.fasta file from the website, and set it up as follows:
>
> result_handle = NCBIWWW.qblast("blastn", "nr", open("opuntia.fasta","r"))
> blast_records = NCBIXML.parse(result_handle)
>
> then I try:
>
> for record in blast_records:
> print record.alignments
>
> and I obtain:
> []
>
>
> Surely at the very least since there were 7 sequences in this file, I
> should get 7 empty lists, assuming of course none of the sequences
> gives a hit in nr, which I am sure is not the case either?
Not necessarily, the NCBI may have fixed this but for a long time if
you had say 7 queries but only 2 gave hits, stand alone BLAST's
XML output would only contain those 2 hits. There would be nothing
at all from the 5 hit less queries. This was/is very annoying, but
right now I'm not sure if they have fixed this or not.
Try getting back the results as plain text and manually inspect them.
In the plain text output all the queries appear, and there is a clear
"no hits found" message.
> What is still missing? I realize I could use SeqIO.parse to obtain
> each sequence from the FASTA file and do a separate qblast, but surely
> doing this separately for each protein would create unnecessary
> overhead with the network traffic compared to somehow sending off all
> the protein queries at once?
Yes, in theory a single large query should have less overhead
than individual queries. Personally I'd just use standalone BLAST
and run it locally if I had more than a few queries.
Peter
More information about the Biopython
mailing list