[Biopython] [Entrez/eFetch] "reasonable" package
Fields, Christopher J
cjfields at illinois.edu
Wed Dec 2 23:06:48 UTC 2015
The way I have traditionally done this (in bioperl) is to iterate through 500 at a time, setting retmax to 500 and varying the offset using retstart, appending records to the same file when they aren’t raw XML (e.g. seq records). Otherwise I loop through and use the objects created from the XML->object mapping to grab the information I care about. I expect the same is possible via Biopython.
chris
On Dec 2, 2015, at 4:04 PM, Peter Cock <p.j.a.cock at googlemail.com<mailto:p.j.a.cock at googlemail.com>> wrote:
Hi,
Currently Biopython does not attempt to do anything about
limiting retmax on your behalf. The suggested retmax limit of 500
is probably specific to that database and/or file format (or so I
would imagine - some records like uilists are tiny in comparison).
Are you using the results as XML? It probably is possible to
merge the XML files, but it might be more hassle that its worth.
I would suggest a double loop ought to work fine - loop over
the collection of XML files, and then for each file loop over the
records returned from the parser.
Regards,
Peter
On Wed, Dec 2, 2015 at 9:39 PM, <c.buhtz at posteo.jp<mailto:c.buhtz at posteo.jp>> wrote:
I asked the Entrez support how should I tread the servers resources
with "respect". :)
First answer was without discrete numbers but in the second one they
told me asking for 500 (retmax for eSearch) is a "reasonable" value
because the eBot (a tool they offer on their website) use it, too.
No I have nearly 13.000 PIDs I want to fetch their article infos via
eFetch. It is a lot. ;)
But I am not sure how to do that with biopython. When I separate that
in 500-packages I would have 26 different record objects back.
I don't like that. I would prefer one big record object I can analyse.
Do you see a way to merge this record objects. Or maybe there is
another way for that?
Or does Biopython.Entrez still handle that problem internal (like the
only-3-per-second-querys-rule or the HTTP-POST-decision)?
Any suggestions?
--
GnuPGP-Key ID 0751A8EC
_______________________________________________
Biopython mailing list - Biopython at mailman.open-bio.org<mailto:Biopython at mailman.open-bio.org>
http://mailman.open-bio.org/mailman/listinfo/biopython<https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.open-2Dbio.org_mailman_listinfo_biopython&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=YTXH5ZFUE3oyyEn8pjXCdNg7TSdmPT4zMpv1K5KQv50&s=Iy6RoEKNJNvviUXzy93QCC--kjF35BmgdlM_YDjDEwk&e=>
_______________________________________________
Biopython mailing list - Biopython at mailman.open-bio.org<mailto:Biopython at mailman.open-bio.org>
http://mailman.open-bio.org/mailman/listinfo/biopython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20151202/afb23271/attachment-0001.html>
More information about the Biopython
mailing list