[Biopython] Access Entrez gene DB using rettype 'gb'

Brad Chapman chapmanb at 50mail.com
Fri Dec 3 13:44:15 UTC 2010


David;

> As an example of what I'm doing: say I want to find the sequence for fliC in
> Salmonella Typhi CT18. Since nucleotide is only giving me entire genomes,
> I've queried the gene database instead. The query is "fliC ct18" and it
> gives me one entry:
> 
> http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=fliC+ct18
> 
> Now I want the raw sequence for that gene. The sequence that shows up when I
> click "FASTA" on the above page:
> 
> http://www.ncbi.nlm.nih.gov/nuccore/NC_003198?report=fasta&from=2011173&to=2012693&strand=true

The best approach here might be to download the FASTA files for your
bacteria of interest, and then extract the sequences you need that
way. For your example, this file has the genes pre-sliced:

ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Salmonella_enterica_serovar_Typhi_CT18_uid57793/NC_003198.ffn

Using EUtils is hard here because there isn't an official identifier
for the sequence you are interested in. In this case you'll have to
pull down the genome and then subset it yourself based on the
coordinates.

Brad



More information about the Biopython mailing list