[Biopython] Access Entrez gene DB using rettype 'gb'

Fri Dec 3 21:57:36 UTC 2010

David;
[Bacterial download example]
> http://www.ncbi.nlm.nih.gov/nuccore/NC_003198?report=fasta&from=2011173&to=2012693&strand=true

Thanks for the additional explanation. The problem is that the fasta
file download doesn't have the fliC identifier in the header line.
It's not that Biopython doesn't present it to you correctly; rather,
NCBI didn't include it in the fasta output. 

Thanks to Peter's tip about efetch supporting slicing, here is a
script that replicates what you're getting by hand. The tricky part is 
that the gene record is returned as XML instead of their native record
format, so it needs to be parsed for the items of interest.

https://gist.github.com/727625

import xml.etree.ElementTree as ET
from Bio import Entrez

def fetch_gene_coordinates(search_term):
    handle = Entrez.esearch(db="gene", term=search_term)
    rec = Entrez.read(handle)
    gene_id = rec["IdList"][0] # assuming best match works

    handle = Entrez.efetch(db="gene", id=gene_id, retmode="xml")
    gene_locus = ET.parse(handle).getroot().find("Entrezgene/Entrezgene_locus")
    region = gene_locus.find("Gene-commentary/Gene-commentary_seqs/Seq-loc/Seq-loc_int/Seq-interval")
    # add 1 to coordinates from XML to match what Entrez expected (0 to 1 based)
    start = int(region.find("Seq-interval_from").text) + 1
    end = int(region.find("Seq-interval_to").text) + 1
    gi_id = region.find("Seq-interval_id/Seq-id/Seq-id_gi").text
    strand = region.find("Seq-interval_strand/Na-strand").get("value")
    return gi_id, start, end, strand

def get_fasta_seq(gi_id, start, end, strand):
    strand = 2 if strand.lower() == "minus" else 1
    handle = Entrez.efetch(db="nucleotide", rettype="fasta", id=gi_id,
            seq_start=start, seq_stop=end, strand=strand)
    return handle.read()

Entrez.email = "yours at mail.com"
search_term = "fliC ct18"

gi_id, start, end, strand = fetch_gene_coordinates(search_term)
print get_fasta_seq(gi_id, start, end, strand)

Hope this works for you,
Brad