[Biopython] Skipping over blank/erroneous Entrez.esummary() results
Austin Davis-Richardson
harekrishna at gmail.com
Tue Oct 6 21:07:52 UTC 2009
Howdy,
I'm using BioPython to generate a table of accession numbers and their
corresponding TaxIDs. The fastest way I can do this is 20 at a time
(20 per 3 seconds rather than 1 per 3 seconds).
However, this results in a problem.
whenever my script receives a result from NCBI that is blank such as
there being no value for TaxID, BioPython crashes with the error:
File "taxcollector3.py", line 39, in getTaxID
record = Entrez.read(handle)
File "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/__init__.py",
line 259, in read
record = handler.run(handle)
File "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py",
line 90, in run
self.parser.ParseFile(handle)
File "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py",
line 191, in endElement
value = IntegerElement(value)
ValueError: invalid literal for int() with base 10: ''
my code looks like this: Where gids is a string of comma-separated GIDs
(I get the GIDs from the accession numbers using
eEntrez.esearch(db="nucleotide", rettype="text", term=accessions))
handle = Entrez.esummary(db="nucleotide", id=gids)
record = Entrez.read(handle)
The only solution I can come up with is searching one at a time, but
this is very slow. (I have about 300,000 accession numbers)
Does anyone know perhaps a patch or a solution for this? Or maybe an
easier way to get a TaxID from an accession number?
Thanks,
Austin Davis-Richardson
More information about the Biopython
mailing list