[Biopython] [Entrez] use the handle twice

Peter Cock p.j.a.cock at googlemail.com
Sat Dec 19 20:35:53 UTC 2015


On Sat, Dec 19, 2015 at 8:27 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Sat, Dec 19, 2015 at 12:41 PM,  <c.buhtz at posteo.jp> wrote:
>> On 2015-12-19 11:06 Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> This is the way network handles work, the server returns
>>> the data to the client computer (your Python code) once
>>> only. As soon as you read it (or Biopython's parser does),
>>> the data is gone.
>>
>> Then make it possible to cache the informations.
>> UndoHandle? Find the API-reference but not a docu
>> with examples how to use this.
>
> It wasn't really intended for general usage. StringIO
> from the Python standard library might be a better plan.

I just reminded myself how Bio.File.UndoHandle works,
and it would not solve your intended usage.

On Python 2 something like this should work:

from StringIO import StringIO
all_the_data = Entrez.efetch(...).read()
with open(output_filename, "w") as out_handle:
    out_handle.write(all_the_data)
for record in Entrez.parse(StringIO(all_the_data)):
   # do things

Note this won't work Python 3 without changes due to
complications about strings as bytes versus unicode.

However, because it really does load all the data from
Entrez into memory as a string, this is not a good idea
for large datasets.

Peter.


More information about the Biopython mailing list