[Biopython-dev] Fwd: [Fwd: missing NCBI DTDs]

Michiel de Hoon mjldehoon at yahoo.com
Wed Mar 26 14:55:46 UTC 2014


Hi Peter,

On Wed, 3/26/14, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Long term not bundling the DTD files seems a good idea.
> Being cautious we could bundle them for the next release,
> see how the download mechanism works in the wild, and
> drop the DTD files for the release after that?
I don't think we need to be so cautious. 
 
> This would mean all the Entrez parser tests would require
> internet access (even if using an old XML file on disk),
But only the first time. After a DTD is downloaded, it is stored
locally, and internet access won't be needed the next time the XML
(or other XML files relying on the same DTD) is parsed.
In my experience, using local DTDs is much much faster than
accessing them through the internet for each XML file, so I
would not advocate an internet-only solution.

As an alternative to local storage, we could consider downloading
all DTDs for each Biopython session, but keeping the results of
parsing the DTD in memory (so we won't have to download each
DTD over and over again if we're parsing many XML files).
This can be almost as fast as using local storage, but will require
internet access, and also Bio.Entrez would have to be changed.

Best,
-Michiel.
 
 



More information about the Biopython-dev mailing list