[Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Mar 20 12:18:53 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2678





------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2009-03-20 08:18 EST -------
(In reply to comment #7)
> (In reply to comment #6)
> > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read
> > it from there. If not, it tries to download it. This may fail if the servers
> > are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when
> > Biopython is installed), you won't run into this problem.
> 
> I was just looking at this on my Windows XP Python 2.3 machine, and when it
> tried to download missing DTD files it was just using a filename as the URL.

In hindsight, I wonder if trying to download missing DTD files is really a good
idea. Suppose a user does a large number of Entrez queries, and saves the
results as XML files. Then, he tries to parse each of those XML files. If a DTD
file is missing, then Bio.Entrez will try to download the same DTD file for
each XML file it is trying to parse. This is not only wasteful, but also
bypasses Entrez's rule of no more than three accesses per second. In addition,
this is fragile. The XML files typically contain a full url to the needed DTD.
But many of Entrez's DTD files contain references to other DTD files, and those
references can be relative. When Bio.Entrez gets such a relative path to where
the DTD file is located, it is difficult to figure out the absolute path to the
DTD. Now we are looking for it in http://www.ncbi.nlm.nih.gov/dtd/, but this
does not seem to contain all required DTDs.

It may therefore make sense not to download the DTD file, but to raise an
Exception with a helpful error message, specifying which DTD file is missing,
where it can possibly be found, and where the DTD file can be installed. It
requires some more effort from the user, but it is more robust, won't break
Entrez' rules, and is more efficient.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list