[Biopython] Entrez EFetch Options

Zach Gayk zgayk at nmu.edu
Thu Jul 9 01:12:46 UTC 2015


Hello,

I would like to use the following code from the biopython tutorial to
retrieve gi numbers for a number of sequences that matched to scaffolds on
a genome assembly:

import os
os.chdir('/Users/zachgayk/Desktop/GAVIABioinformatics/')
from Bio import Entrez # this is the most likely script modified
from Bio import SeqIO
Entrez.email = "zgayk at nmu.edu"
handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", \
                       id="gi|50254217|gb|`, gi|50254217|gb|AY567890.1|,
gi|559028|gb|L33375.1|GVSMTDGI,
gi|559028|gb|L33375.1|GVSMTDGI")
for seq_record in SeqIO.parse(handle, "gb"):
    print seq_record.description[:100] + "..." # the :100 specifies no.
characters and "..." says this comes after specified character limit
handle.close()

The problem, however, is that there are a large number of gi numbers I
wish to retrieve, and so there are simply too many to manually enter into
the id ="" field. What I would like to do is specify a file containing all
of the needed gi numbers in a list and then have the code parse all of
them. I haven't been able to figure out how to do this yet, and if anyone
has any ideas they would be very much appreciated.

Thank you,
Zach Gayk






More information about the Biopython mailing list