[Biopython] help with ncbiWWW

Pejvak Moghimi pejvak.moghimi at york.ac.uk
Sat Jul 29 00:07:59 UTC 2017


Thank you all for the great responses. I have another quick question.

Since, hsp_seq sometimes gives a fragment of the hit-sequence, is there a
better way of getting the full-length hit-sequence than using the seqID to
get the full-length sequence through "Efetch".

Cheers,
Pej.

On 26 July 2017 at 21:50, Damian Menning <dmenning at mail.usf.edu> wrote:

> I ran in to a similar problem downloading multiple FASTA files from NCBI
> where it would get 'hung up' on large sequences.  I found a function on
> StackOverflow that worked well. It's super simple, effective, and should
> work with your search with minor tweeking.  It's currently set to timeout
> after 10 seconds.
>
>   Damian
>
> On Wed, Jul 26, 2017 at 6:24 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>
>> That does help, thank you.
>>
>> First of all that tells me you are using Windows and your Python is
>> from Anaconda (probably not important here).
>>
>> Now, I had been guessing the code was getting stuck while actually
>> connecting to the NCBI and waiting an update - which is where that
>> socket timeout would come into play.
>>
>> I see now the problem is when Biopython checks for an update,
>> waits for a bit, checks for an update, waits for a bit, ... and never
>> gives up:
>>
>> https://github.com/biopython/biopython/blob/biopython-170/Bi
>> o/Blast/NCBIWWW.py#L164
>>
>> The code increases the wait interval to 120s (two minutes), but
>> currently has no (optional) maximum total waiting time. Adding
>> this as an option seems sensible (e.g. a maximum total waiting
>> time of say 5 or 10 mins).
>>
>> Also, it would be good to check if the NCBI is returning some
>> clue or error message which our code does not understand...
>>
>> From your initial description is sounds like you have not found
>> any single example which fails - so this is going to be hard to
>> test.
>>
>> Peter
>>
>> On Wed, Jul 26, 2017 at 3:04 PM, Pejvak Moghimi
>> <pejvak.moghimi at york.ac.uk> wrote:
>> > Hi Peter,
>> >
>> > Here it is:
>> >
>> > Traceback (most recent call last):
>> >
>> >   File "<ipython-input-107-561cd74d2097>", line 1, in <module>
>> >     runfile('D:/Dropbox/Pejvak
>> > Moghimi/DMT_project/blast_for_clav_seqs/blastScript(altered).py',
>> > wdir='D:/Dropbox/Pejvak Moghimi/DMT_project/blast_for_clav_seqs')
>> >
>> >   File
>> > "C:\Users\pezhv\Anaconda3\lib\site-packages\spyder\utils\sit
>> e\sitecustomize.py",
>> > line 880, in runfile
>> >     execfile(filename, namespace)
>> >
>> >   File
>> > "C:\Users\pezhv\Anaconda3\lib\site-packages\spyder\utils\sit
>> e\sitecustomize.py",
>> > line 102, in execfile
>> >     exec(compile(f.read(), filename, 'exec'), namespace)
>> >
>> >   File "D:/Dropbox/Pejvak
>> > Moghimi/DMT_project/blast_for_clav_seqs/blastScript(altered).py", line
>> 116,
>> > in <module>
>> >     result_handle = NCBIWWW.qblast("blastp", "nr", sequence,
>> > hitlist_size=500, entrez_query = orgn_specified)
>> >
>> >   File "C:\Users\pezhv\Anaconda3\lib\site-packages\Bio\Blast\NCBIWW
>> W.py",
>> > line 164, in qblast
>> >     time.sleep(wait)
>> >
>> >
>> > Cheers,
>> > Pej.
>> >
>> >
>> > On 26 July 2017 at 14:57, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> >>
>> >> Hi Pej.
>> >>
>> >> Hmm. Maybe setting the timeout is not going to solve your
>> >> problem. I was hoping that would be a neat solution.
>> >>
>> >> Can you show us the stack trace when you had to stop a job
>> >> please?
>> >>
>> >> I assume you are using control+c to do this, in which case
>> >> Python ought to stop with the exception KeyboardInterrupt.
>> >> What I am interested in here is where in the code Python
>> >> is getting stuck. That would be a good clue.
>> >>
>> >> Peter
>> >>
>> >> On Wed, Jul 26, 2017 at 2:47 PM, Pejvak Moghimi
>> >> <pejvak.moghimi at york.ac.uk> wrote:
>> >> > Hi Peter,
>> >> >
>> >> > That solution, so far, does not seem to have worked nor with 10
>> neither
>> >> > with
>> >> > 30 second options.
>> >> >
>> >> > Cheers,
>> >> > Pej.
>> >> >
>> >> > On 26 July 2017 at 13:29, Peter Cock <p.j.a.cock at googlemail.com>
>> wrote:
>> >> >>
>> >> >> I am hoping that putting this near the start of your script will
>> >> >> apply the default timeout to all your BLAST calls (or other
>> >> >> network calls, e.g. NCBI Entrez):
>> >> >>
>> >> >> import socket
>> >> >> socket.setdefaulttimeout(30)  # timeout in seconds
>> >> >>
>> >> >> Peter
>> >
>> >
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>
>
>
> --
> Damian Menning, Ph.D.
>
> "There are two types of academics. Those who use the Oxford comma, those
> who don't and those who should."
>
> Standard comma - You know Bob, Sue and Greg? They came to my house.
> Oxford comma - You know Bob, Sue, and Greg? They came to my house.
> Walken Comma - You know, Bob, Sue, and Greg? They came, to my house.
> Shatner comma - You, know, Bob, Sue, and Greg? They, came, to my house.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170729/f20db4d6/attachment-0001.html>


More information about the Biopython mailing list