[Biopython] Entrez EFetch Options
Zach Gayk
zgayk at nmu.edu
Thu Jul 9 01:12:46 UTC 2015
Hello,
I would like to use the following code from the biopython tutorial to
retrieve gi numbers for a number of sequences that matched to scaffolds on
a genome assembly:
import os
os.chdir('/Users/zachgayk/Desktop/GAVIABioinformatics/')
from Bio import Entrez # this is the most likely script modified
from Bio import SeqIO
Entrez.email = "zgayk at nmu.edu"
handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", \
id="gi|50254217|gb|`, gi|50254217|gb|AY567890.1|,
gi|559028|gb|L33375.1|GVSMTDGI,
gi|559028|gb|L33375.1|GVSMTDGI")
for seq_record in SeqIO.parse(handle, "gb"):
print seq_record.description[:100] + "..." # the :100 specifies no.
characters and "..." says this comes after specified character limit
handle.close()
The problem, however, is that there are a large number of gi numbers I
wish to retrieve, and so there are simply too many to manually enter into
the id ="" field. What I would like to do is specify a file containing all
of the needed gi numbers in a list and then have the code parse all of
them. I haven't been able to figure out how to do this yet, and if anyone
has any ideas they would be very much appreciated.
Thank you,
Zach Gayk
More information about the Biopython
mailing list