[Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string for HTML (not an error)
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Wed Oct 28 10:57:42 UTC 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2938
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-10-28 06:57 EST -------
Good point - and hopefully the NCBI will make all their XML consistent.
In the meantime, instead of the white list, how about a blacklist?
i.e. If the data starts "<html" (ignoring case) raise an error?
We could also spot things like FASTA and GenBank files etc, and
as all we want to do is spot non-XML, this should be reliable.
We discussed some of these issues before: I originally suggested
doing an XML check in Bio.Entrez.efetch (etc), and you countered
this saying the format validation should be in the parser (as in
this specific bug report):
http://lists.open-bio.org/pipermail/biopython-dev/2009-March/005461.html
...
http://lists.open-bio.org/pipermail/biopython-dev/2009-March/005477.html
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list