[Biopython-dev] Two issues on Bio.Entrez (DTD download fallback, recent change in id lists)
Daniël van Adrichem
daniel at treparel.com
Thu Mar 1 12:21:42 UTC 2012
Hello list,
Firstly I want to report a bug plus suggested fix.
Today I noticed a bug which got triggered by missing local DTDs. I was
still using 1.58 which does not have the new DTDs.
Missing the DTDs locally should be handled by downloading them. This
worked for the first DTD, but then on the second one (which is a
dependency of the first one) I got a HTTP 404.
After investigating I found that the module was making a request for
"http://www.ncbi.nlm.nih.gov/corehtml/query/DTD\nlmmedlinecitationset_120101.dtd"
Note the backslash right after DTD. It gets turned into a %5C and
causes the 404.
The cause of this is usage of os.path.join to concatenate the URL. I
am running this on windows, on a platform where the file system uses a
forward slash this would work just fine.
please find attached a patch to fix this issue.
Secondly I want to comment on the recent change in Bio.Entrez.efetch
(commit 01b091cd4679b58d7e478734324528dd9d52f3ed). While this change
did fix the problem, I think this might be achieved in a cleaner way.
Please see the code that is used to format the options on the url (in
Bio.Entrez._open):
options = urllib.urlencode(params, doseq=True)
the doseq argument specifically. Its documentation states:
"If any values in the query arg are sequences and doseq is true, each
sequence element is converted to a separate parameter."
So this was the reason for the "id=1&id=2&id=3" formatting. Without
doseq set this would turn into: "id=1,2,3"
If this doseq functionality is not needed for other params (I am
unsure of this), I suggest to revert the change in efetch() and use
doseq=False (which is default argument)
Thanks!
--
Daniël van Adrichem
Treparel Information Solutions b.v.
Delftechpark 26
2628XH Delft
The Netherlands
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Parser.py.diff
Type: application/octet-stream
Size: 592 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20120301/a4290ea2/attachment-0002.obj>
More information about the Biopython-dev
mailing list