[Biopython] problem blasting on line

Peter biopython at maubp.freeserve.co.uk
Wed Nov 17 16:37:47 EST 2010


Hi Jessica,

On Wed, Nov 17, 2010 at 9:22 PM, Jessica Grant <jgrant at smith.edu> wrote:
> Hello,
>
> I am trying to use blast to extract contaminating sequences from a set of
> 454 sequence data.  My script uses NCBIWWW.qblast as follows:
>
> result_handle = NCBIWWW.qblast("blastx", "nr", record.format("fasta"),
> ncbi_gi=False, descriptions= "1", alignments="1", format_type="XML",
> hitlist_size="1", entrez_query='(Bacteria[ORGN])')
>
>
> It works...for a while but it stops, eventually, with the following error:
>
>
> traceback (most recent call last):
> ...
> urllib2.HTTPError: HTTP Error 404: Not Found
>
>
> I suppose that the problem is a communication problem with ncbi.

Probably some kind of network problem, yes.

> I have
> written a try except statement into my script, but I seem to be losing quite
> a few records as they get skipped over if the error occurs.
>
> I thought about downloading nr and using the standalone blast, but it seems
> the downloadable nr database comes in several parts, already formatted for
> blast.  Can I concatenate these?
>
> Any thoughts on the problem with the qblast or other ways to circumvent this
> problem would be greatly appreciated!
>
> Jessica

How many sequences are you trying to BLAST? If it is more than
a few dozen I would definitely recommend installing and running
BLAST locally.

Regarding the NR database, yes, it comes in parts, but this is by
design. There is a main index file which tells the BLAST command
line tools about all the subparts - very easy. Just download all the
nr.*.tar.gz files into your BLAST database folder and uncompress
them.

Peter



More information about the Biopython mailing list