[Biopython] Problem with Bio.Entrez...

Peter biopython at maubp.freeserve.co.uk
Fri Aug 27 08:32:57 UTC 2010


On Fri, Aug 27, 2010 at 5:39 AM, Nathan J. Edwards <nje5 at georgetown.edu> wrote:
> On 8/27/2010 12:13 AM, Nathan J. Edwards wrote:
>>
>> It would be nice (IWBN) if the parser threw an exception that indicated
>> that the returned XML didn't validate...at least then the (very
>> cryptic!) error message wouldn't look like a logic error in the parser.
>
> Actually, after wading though the Parser.py code some more, the bad element
> is clearly detected, and an attempt is made to ignore it (the empty string),
> which then subsequently leads to the TypeError exception in
> endElementHandler.

I'd suggest issuing a warning for the bad element, rather than silently
ignoring it.

> Maybe the test and return of my first email is sufficient to deal with the
> empty strings inserted as part of the "ignore it" strategy.
>
> And, given the frequency with which NCBI seems to break these things,
> I _do_ prefer the "ignore it" strategy, if it works. :-)

Michiel - does Nathan's fix make sense? We should probably save an example
of this broken XML for a unit test...

Nathan - have you notified the NCBI about this? I assume you would get
an error putting the XML through a validator - if you haven't already done
so that would be worthwhile. Or would you rather one of us contact them?

Thanks,

Peter



More information about the Biopython mailing list