[Biopython-dev] Bio.Entrez catching more errors

Wed Mar 25 11:47:59 UTC 2009

> What about the fairly common situation (at, its something
> I've done fairly often) where Bio.Entrez.efetch() is used
> to fetch records which are saved directly to file without
> verification - e.g. to be parsed by another program?
> Unless the error is caught in Bio.Entrez.efetch()
> it may be out of our control.

That is easy: just run the output returned by NCBI through the appropriate parser. If the parser is happy, proceed to save the NCBI output in a file.

> The first half of the email (the main point) was based
> on a special case: HTML and XML are pretty easy to
> identify.  If you ask for HTML and don't get it, it is
> an error (and vice versa).  If you ask for XML and don't
> get it, it is an error (and vice versa).  The fact that
> the NCBI currently often return an HTML or XML error
> page when a plain text format was requested is then
> easily detected as an error (simply from the file type).
> This will still work even if the NCBI do change their
> error formats or wording - it should be pretty robust.

Have a look at serialset.xml in the Bio.Entrez test cases ... this is the output obtained from NCBI using efetch from the journals database with retmode='xml'. The file looks like XML, but it doesn't start with "<!xml". However, Bio.Entrez.read parses it correctly, so while it's not pretty to me this would not count as an error.

--Michiel.