[Biopython-dev] Two issues on Bio.Entrez (DTD download fallback, recent change in id lists)

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 1 13:34:39 UTC 2012


2012/3/1 Daniël van Adrichem <daniel at treparel.com>:
> Hello list,
>
> Firstly I want to report a bug plus suggested fix.
>
> Today I noticed a bug which got triggered by missing local DTDs. I was
> still using 1.58 which does not have the new DTDs.
>
> Missing the DTDs locally should be handled by downloading them. This
> worked for the first DTD, but then on the second one (which is a
> dependency of the first one) I got a HTTP 404.
>
> After investigating I found that the module was making a request for
> "http://www.ncbi.nlm.nih.gov/corehtml/query/DTD\nlmmedlinecitationset_120101.dtd"
> Note the backslash right after DTD. It gets turned into a %5C and
> causes the 404.

That DTD should be in Biopython 1.59 - and hopefully the other
DTD you mentioned but did not name. Please let us know if there
are any more we've missed.

https://github.com/biopython/biopython/commit/5f08ccdfe0706f9073bce441609aa86b1ea9d0f4

> The cause of this is usage of os.path.join to concatenate the URL. I
> am running this on windows, on a platform where the file system uses a
> forward slash this would work just fine.
>
> please find attached a patch to fix this issue.

That makes perfect sense, although as written your patch could
result in too many slashes being used - thus:
https://github.com/biopython/biopython/commit/c93b32bab5526a830e2cb14f0db782ee1b687715

Would you like to be thanked in the NEWS file and listed as a contributor
(in the CONTRIB file)?

> Secondly I want to comment on the recent change in Bio.Entrez.efetch
> (commit 01b091cd4679b58d7e478734324528dd9d52f3ed). While this change
> did fix the problem, I think this might be achieved in a cleaner way.
>
> Please see the code that is used to format the options on the url (in
> Bio.Entrez._open):
>
> options = urllib.urlencode(params, doseq=True)
>
> the doseq argument specifically. Its documentation states:
> "If any values in the query arg are sequences and doseq is true, each
> sequence element is converted to a separate parameter."
>
> So this was the reason for the "id=1&id=2&id=3" formatting. Without
> doseq set this would turn into: "id=1,2,3"
>
> If this doseq functionality is not needed for other params (I am
> unsure of this), I suggest to revert the change in efetch() and use
> doseq=False (which is default argument)

Very good question - Michiel?

Thanks,

Peter




More information about the Biopython-dev mailing list