[Biopython] 400 error, Did Entrez block me or is it something else?

Norbert Auer norbert.auer at boku.ac.at
Fri Dec 6 10:09:18 UTC 2013


Hi David,

The Entrez Service seems to be very buggy. Sometimes I even got
different results for the same query. Therefore I stopped experiments
with the Entrez API Service.

But you can use a faster and more robust approach. You can simply
download the whole data at once from NCBI's ftp site. For example the
address for the complete gene information from mouse is:

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/Mus_musculus.ags.gz

Then you can use the NCBI ToolKit utility programs to convert this ags
binary file into xml. This format is the same what you get from the
Entrez.fetch command. It is also described in the biopython tutorial.
http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec128

You can use this simple shell script to download and extract the right
genome and convert it into xml.

curl -C -
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/Mammalia/{All_Mammalia.ags.gz}
-o "Sources/ASN/#1"
gene2xml -b T -c T -i Sources/ASN/All_Mammalia.ags.gz -t 10029 -o
Output/10029.xml

These lines extract a whole gene information for Cricetulus
griseus(10029) in one turn. I also work on a parser to extract refseq
and gi ids (genomic-mRNA-peptide) from this xml into csv and html files.
If you work by any chance with mouse or CHO sequences you can take a
look here.

http://ala.boku.ac.at:4080/nauer/tools/genomes

Hope this helps.

Best regards,

Norbert


Am 2013-12-06 08:20, schrieb David Shin:
> Thanks guys -
> 
> For now it eventually worked its way out, but if you could show me a
> snippet of how try and except statements work, I would be thankful (I know
> I can probably google it too), I'm also new to python in general, so coding
> examples always help.
> 
> Thanks again,
> 
> Dave
> 
> 
> On Thu, Dec 5, 2013 at 7:13 AM, <phyrexian.kavu at gmail.com> wrote:
> 
>> Hi David!
>>
>> I had the same error while trying to download hundreds of sequences from
>> Entrez.efetch, I'm afraid this error cannot be resolved, since as Peter
>> says it is an internet error. I avoided it by making a loop and adding the
>> try - except statements, so whenever this error appeared, the script just
>> returned and tried it again until the sequences were downloaded correctly.
>> I know it is a brute force-like approach, but this was the best solution I
>> could think of, I hope It helps you too.
>>
>> Miguel
>>
>>
>> [Theropoda is my profession]
>>
>>> On 05/12/2013, at 06:31, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>
>>> Hi David,
>>>
>>> This is probably an intermittent failure, perhaps the NCBI
>>> is very busy or there could be a temporary network problem
>>> somewhere. The chances are it will work tomorrow...
>>>
>>> Peter
>>>
>>>
>>>> On Thu, Dec 5, 2013 at 12:05 PM, David Shin <davidsshin at lbl.gov> wrote:
>>>> Hi all -
>>>>
>>>> During off peak times, I was working on a script that takes a list of gi
>>>> numbers (a list of 8) for protein sequences to use as input to get fasta
>>>> sequences via Entrez.efetch. I passed along my email address during the
>>>> searches. I am still new to python, etc. so had been testing my program.
>>>>
>>>> At some point I started getting the error below, and I'm not sure if it
>> is
>>>> my program, my web provider, or if ncbi got mad at me. It's been like
>> this
>>>> for about an hour, so giving up for the night.
>>>>
>>>> Thanks
>>>>
>>>> error:
>>>>
>>>> Traceback (most recent call last):
>>>>  File "../gi-to-fasta5.py", line 11, in <module>
>>>>    handle = Entrez.efetch(db="protein", rettype="fasta", retmode="text",
>>>> id=gi_numbers)
>>>>  File
>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py",
>>>> line 144, in efetch
>>>>    return _open(cgi, variables, post)
>>>>  File
>>>> "/Users/daanac/onda/lib/python2.7/site-packages/Bio/Entrez/__init__.py",
>>>> line 460, in _open
>>>>    raise exception
>>>> urllib2.HTTPError: HTTP Error 400: Bad Request
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> David Shin, Ph.D
>>>> Lawrence Berkeley National Labs
>>>> 1 Cyclotron Road
>>>> MS 83-R0101
>>>> Berkeley, CA 94720
>>>> USA
>>>> _______________________________________________
>>>> Biopython mailing list  -  Biopython at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biopython
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
> 
> 
> 



More information about the Biopython mailing list