[Biopython-dev] [biopython] Missing DTD files (#260)

Peter Cock p.j.a.cock at googlemail.com
Tue Dec 3 10:38:43 UTC 2013


On Sun, Dec 1, 2013 at 3:28 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> How would people feel about Biopython always downloading DTD files
> on the fly instead of distributing them with Biopython?
>
> After downloading and parsing a DTD file, we can keep it in memory
> so we won't need to parse the same DTD file over and over again.
> So the impact on speed will be minimal.
>
> If we do so, we'll never run into the problem of missing DTD files. The
> downside of course is that we will need internet access to parse any
> XML file through Bio.Entrez. But maybe in today's world that is acceptable.

Requiring network access would be annoying for offline work
(e.g. how we usually run the automated tests), but most of the
NCBI Entrez XML files will (I expect) will be downloaded and
immediately parsed. So for usability this seems OK.

Automatic caching to disk (without a scary warning) seems like a
better idea than always downloading the DTD files on demand
(which seems wasteful of bandwidth and more likely to give
intermittent errors), although as you have noted before there
is the open question of where to put this files (including where
on Windows):

http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008310.html

Regards,

Peter



More information about the Biopython-dev mailing list