[Biopython] Skipping over blank/erroneous Entrez.esummary() results

Tue Oct 6 22:11:36 EDT 2009

You could try the following (with biopython 1.52):

handle = Entrez.esummary(db="nucleotide", id=gids)
records = Entrez.parse(handle)
while True:
    try:
        record = records.next()
    except StopIteration:
        break
    except:
        print "Skipping record"

We should probably modify Bio.Entrez so that empty "integer" values are treated correctly.

--Michiel.

--- On Tue, 10/6/09, Austin Davis-Richardson <harekrishna at gmail.com> wrote:

> From: Austin Davis-Richardson <harekrishna at gmail.com>
> Subject: [Biopython] Skipping over blank/erroneous Entrez.esummary() results
> To: biopython at lists.open-bio.org
> Date: Tuesday, October 6, 2009, 5:07 PM
> Howdy,
> 
> I'm using BioPython to generate a table of accession
> numbers and their
> corresponding TaxIDs.  The fastest way I can do this
> is 20 at a time
> (20 per 3 seconds rather than 1 per 3 seconds).
> 
> However, this results in a problem.
> 
> whenever my script receives a result from NCBI that is
> blank such as
> there being no value for TaxID, BioPython crashes with the
> error:
> 
>   File "taxcollector3.py", line 39, in getTaxID
>     record = Entrez.read(handle)
>   File
> "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/__init__.py",
> line 259, in read
>     record = handler.run(handle)
>   File
> "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py",
> line 90, in run
>     self.parser.ParseFile(handle)
>   File
> "/Users/audy/Downloads/biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py",
> line 191, in endElement
>     value = IntegerElement(value)
> ValueError: invalid literal for int() with base 10: ''
> 
> 
> my code looks like this:  Where gids is a string of
> comma-separated GIDs
> (I get the GIDs from the accession numbers using
> eEntrez.esearch(db="nucleotide", rettype="text",
> term=accessions))
> 
>            
> handle = Entrez.esummary(db="nucleotide", id=gids)
>            
> record = Entrez.read(handle)
> 
> 
> The only solution I can come up with is searching one at a
> time, but
> this is very slow.  (I have about 300,000 accession
> numbers)
> 
> Does anyone know perhaps a patch or a solution for
> this?  Or maybe an
> easier way to get a TaxID from an accession number?
> 
> Thanks,
> Austin Davis-Richardson
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>