[Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string for HTML (not an error)

Wed Oct 28 10:57:42 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2938

------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-10-28 06:57 EST -------
Good point - and hopefully the NCBI will make all their XML consistent.

In the meantime, instead of the white list, how about a blacklist?
i.e. If the data starts "<html" (ignoring case) raise an error?
We could also spot things like FASTA and GenBank files etc, and
as all we want to do is spot non-XML, this should be reliable.

We discussed some of these issues before: I originally suggested
doing an XML check in Bio.Entrez.efetch (etc), and you countered
this saying the format validation should be in the parser (as in
this specific bug report):

http://lists.open-bio.org/pipermail/biopython-dev/2009-March/005461.html
...
http://lists.open-bio.org/pipermail/biopython-dev/2009-March/005477.html

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.