[Biopython] Skipping over blank/erroneous Entrez.esummary() results
Michiel de Hoon
mjldehoon at yahoo.com
Tue Oct 6 22:11:36 EDT 2009
You could try the following (with biopython 1.52):
handle = Entrez.esummary(db="nucleotide", id=gids)
records = Entrez.parse(handle)
while True:
try:
record = records.next()
except StopIteration:
break
except:
print "Skipping record"
We should probably modify Bio.Entrez so that empty "integer" values are treated correctly.
--Michiel.
--- On Tue, 10/6/09, Austin Davis-Richardson <harekrishna at gmail.com> wrote:
> From: Austin Davis-Richardson <harekrishna at gmail.com>
> Subject: [Biopython] Skipping over blank/erroneous Entrez.esummary() results
> To: biopython at lists.open-bio.org
> Date: Tuesday, October 6, 2009, 5:07 PM
> Howdy,
>
> I'm using BioPython to generate a table of accession
> numbers and their
> corresponding TaxIDs. The fastest way I can do this
> is 20 at a time
> (20 per 3 seconds rather than 1 per 3 seconds).
>
> However, this results in a problem.
>
> whenever my script receives a result from NCBI that is
> blank such as
> there being no value for TaxID, BioPython crashes with the
> error:
>
> File "taxcollector3.py", line 39, in getTaxID
> record = Entrez.read(handle)
> File
> "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/__init__.py",
> line 259, in read
> record = handler.run(handle)
> File
> "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py",
> line 90, in run
> self.parser.ParseFile(handle)
> File
> "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py",
> line 191, in endElement
> value = IntegerElement(value)
> ValueError: invalid literal for int() with base 10: ''
>
>
> my code looks like this: Where gids is a string of
> comma-separated GIDs
> (I get the GIDs from the accession numbers using
> eEntrez.esearch(db="nucleotide", rettype="text",
> term=accessions))
>
>
> handle = Entrez.esummary(db="nucleotide", id=gids)
>
> record = Entrez.read(handle)
>
>
> The only solution I can come up with is searching one at a
> time, but
> this is very slow. (I have about 300,000 accession
> numbers)
>
> Does anyone know perhaps a patch or a solution for
> this? Or maybe an
> easier way to get a TaxID from an accession number?
>
> Thanks,
> Austin Davis-Richardson
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list