[BioPython] Entrez.efetch large files

Peter biopython at maubp.freeserve.co.uk
Wed Oct 8 19:32:59 UTC 2008


> Yes - one big hint: DON'T try and parse these large files directly
> from the internet.  Use efetch to download the file and save it to
> disk.  Then open this local file for parsing.
> ...
> Do you think the Biopython tutorial should be more explicit about this
> topic?

I've changed the tutorial (the SeqIO and Entrez chapters) in CVS to
make this advice more explicit, and included an example of doing this
too.

import os
from Bio import SeqIO
from Bio import Entrez
Entrez.email = "A.N.Other at example.com"     # Always tell NCBI who you are
filename = "gi_186972394.gbk"
if not os.path.isfile(filename) :
    print "Downloading..."
    net_handle = Entrez.efetch(db="nucleotide",id="186972394",rettype="genbank")
    out_handle = open(filename, "w")
    out_handle.write(net_handle.read())
    out_handle.close()
    net_handle.close()
    print "Saved"

print "Parsing..."
record = SeqIO.read(open(filename), "genbank")
print record


Peter



More information about the Biopython mailing list