[Biopython-dev] Bio.Entrez & Bio.EUtil

Michiel de Hoon mjldehoon at yahoo.com
Tue Jun 3 00:19:59 UTC 2008


OK I'll double-check. I may not have noticed some missing DTDs if they were downloaded automatically from the internet. I think Biopython should ship the most common DTDs. At least the ones needed for test_Entrez, which probably covers most of the use cases of Bio.Entrez.

--Michiel.

Peter <biopython at maubp.freeserve.co.uk> wrote: On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter


       



More information about the Biopython-dev mailing list