[Biopython-dev] Bio.Entrez & Bio.EUtil

Michiel de Hoon mjldehoon at yahoo.com
Tue Jun 3 04:33:27 UTC 2008


I checked but I did not see any missing DTDs. Most of the DTDs in the list you sent are in Biopython's CVS under Bio/Entrez/DTDs, and are included correctly if I do a fresh checkout from CVS. Maybe could you try with a fresh checkout?

--Michiel.

Michiel de Hoon <mjldehoon at yahoo.com> wrote: OK I'll double-check. I may not have noticed some missing DTDs if they were downloaded automatically from the internet. I think Biopython should ship the most common DTDs. At least the ones needed for test_Entrez, which probably covers most of the use cases of Bio.Entrez.

--Michiel.

Peter  wrote: On 24 May 2008, Michiel de Hoon wrote:
> Dear all,
>
> I have essentially completed the parser in Bio.Entrez.

The internals of the new design look more complicated to start with,
but I can see how much more general it is than the older versions :)

Should it work starting from an empty DTDs folder - or will we ship
Biopython with most of the current files?  I've had trouble with
Biopython trying to fetch missing DTD files from the internet.  I
think the problem is the NCBI using relative URLs.  The following
quick hack seems to help in Parser.py but only in some cases (because
as listed below, the NCBI have two different base paths):

279,280c279,288
<             warnings.warn("DTD file %s not found in Biopython
installation; trying to retrieve it from NCBI" % filename)
<             handle = urllib.urlopen(systemId)
---
>             warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path)
>             if "/" in systemId :
>                 #Assume this is a full path, e.g.
>                 #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd
>                 handle = urllib.urlopen(systemId)
>             else :
>                 #Its a relative path, and I'm not sure how to best get the base path:
>                 handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId)

(Also note there seem to be some tab/space isssues in this file).

>From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the
following files using wget:

egquery.dtd
eSearch_020511.dtd
nlmcommon_080101.dtd
pubmed_080101.dtd
eInfo_020511.dtd
eSpell.dtd
nlmmedline_080101.dtd
taxon.dtd
eLink_020511.dtd
eSummary_041029.dtd
nlmmedlinecitation_080101.dtd
uilist.dtd
ePost_020511.dtd
nlmsharedcatcit_080101.dtd

Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further
XML files needed for the test_Entrez.py unit test:

NCBI_GBSeq.dtd
NCBI_GBSeq.mod.dtd
NCBI_Entity.mod.dtd
NCBI_Mim.dtd
NCBI_Mim.mod.dtd

With all the above files, then the unit test file test_Entrez.py
doesn't give any missing DTD warnings - but still has a couple of
failures.

Peter


       
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


       



More information about the Biopython-dev mailing list