[Biopython] Skipping over blank/erroneous Entrez.esummary() results

Brad Chapman chapmanb at 50mail.com
Wed Oct 7 11:17:37 UTC 2009


Hi Austin;

> I'm using BioPython to generate a table of accession numbers and their
> corresponding TaxIDs.  The fastest way I can do this is 20 at a time
> (20 per 3 seconds rather than 1 per 3 seconds).
> 
> However, this results in a problem.
> 
> whenever my script receives a result from NCBI that is blank such as
> there being no value for TaxID, BioPython crashes with the error:
> 
>   File "taxcollector3.py", line 39, in getTaxID
>     record = Entrez.read(handle)
>   File "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/__init__.py",
> line 259, in read
>     record = handler.run(handle)
>   File "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py",
> line 90, in run
>     self.parser.ParseFile(handle)
>   File "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py",
> line 191, in endElement
>     value = IntegerElement(value)
> ValueError: invalid literal for int() with base 10: ''

In addition to Michiel's workaround, I checked in a small change
which could at least circumvent the error you are reporting:

http://github.com/biopython/biopython/commit/4dca8a24f62a1c28556d4e58f34db66f4b099279

It affects only one file, so if you don't want to pull the latest
from GitHub, you can download just that file and replace it in your
Biopython library:

http://github.com/biopython/biopython/blob/master/Bio/Entrez/Parser.py

Ideally, we should have a test case to cover this. Could you let us
know specific GIs that are causing the problem? The group of 20 is
fine if you haven't narrowed it further than that. This'll also help
us check if there are any other problems with these records.

Thanks for reporting this,
Brad



More information about the Biopython mailing list