[Biopython-dev] Bio.Entrez & Bio.EUtil

Peter biopython at maubp.freeserve.co.uk
Fri May 30 14:17:08 UTC 2008


On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter



More information about the Biopython-dev mailing list