[Biopython] [Entrez] use the handle twice

Peter Cock p.j.a.cock at googlemail.com
Sat Dec 19 20:27:49 UTC 2015


On Sat, Dec 19, 2015 at 12:41 PM,  <c.buhtz at posteo.jp> wrote:
> On 2015-12-19 11:06 Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> This is the way network handles work, the server returns
>> the data to the client computer (your Python code) once
>> only. As soon as you read it (or Biopython's parser does),
>> the data is gone.
>
> Then make it possible to cache the informations.
> UndoHandle? Find the API-reference but not a docu
> with examples how to use this.

It wasn't really intended for general usage. StringIO
from the Python standard library might be a better plan.

>> For large data, the best plan is simply to save the data
>> to a file on disk, then reopen the file for reading and
>> parse it.
>
> I see no way to do this. e.g.
>
> import Bio.Entrez
> import zipfile
>
> z = zipfile.ZipFile('test.zip')
> h = z.read( z.namelist()[0] )
> r = Bio.Entrez.parse(h)
>
> Now r is a generator and I can not ask things like that
>  r['Count']
>
> And Bio.Entrez.read(h) doesn't work here - don't understand why.

>From your other thread, it seems h is not a handle,
http://lists.open-bio.org/pipermail/biopython/2015-December/015853.html

Try it with a plain file first. If you need compression,
try gzip instead (simpler as a gzipped file only contains
one file - a zip file can contain multiple files so it more
complicated to use).

> I just want to know how much records really are comin back
> from the server.

As suggested before, the simplest solution here if you also
want to save the data to disk is:

(1) save the from Entrez data to a file
(2) parse the file

Peter.


More information about the Biopython mailing list