[Biopython-dev] Bio.Entrez.parse

Peter Cock p.j.a.cock at googlemail.com
Sat Sep 5 08:59:09 EDT 2009


On Sat, Sep 5, 2009 at 9:17 AM, Michiel de Hoon<mjldehoon at yahoo.com> wrote:
> Hi everybody,
> Recently I was trying to parse a huge Entrez XML file containing Entrez gene
> records. Because of the size of the file, Entrez.read failed with a memory
> error since it could not keep the entire information in the XML file in memory.
> I decided to add a parse() function to Bio.Entrez that can iterate of such large
> files. This function is useful if the XML file essentially contains a list of records;
> the parse() function is a generator function that returns these records one by one.

That sounds excellent - I'd noticed that usually Bio.Entez.read() would return
a list of (large nested) records, so this should be a natural extension.

Peter


More information about the Biopython-dev mailing list