[Biopython] save efetch results in different files

Peter biopython at maubp.freeserve.co.uk
Wed Apr 28 09:57:48 UTC 2010


On Wed, Apr 28, 2010 at 10:24 AM, Silvio Tschapke
<silvio.tschapke at googlemail.com> wrote:
> Hi all,
>
> I'd like to download hundreds of pubmed entries in one turn, but save every
> entry in a single file for further processing with e.g. NLTK.
> Is this possible? Or what is the common way to do this? Or do I have to call
> efetch for every single pmid? I dont know how.

Personally I would probably save each pubmed result to a separate file
named using the pmid - a Unix filesystem should cope fine with a few
thousand files in a single directory. This is simple and lets you add more
entries at a later date, and you have simple access to any record.

The other approach of combining separate entries into multiple files sounds
overly complicated (although possible), while another approach would be a
single large file containing all the records in one. These would require a
index if you needed random access to the entries by pmid.

> Could you also explain me what handle.read() does? Entrez.read(handle) I
> understand, because it is documented, but handle.read() not. What kind of
> type is a handle?

It is *like* a standard handle that you'd get in python from open(filename).
This is an object supporting read() giving all the remaining data as a string,
readline() giving the next line etc.

Peter



More information about the Biopython mailing list