[Biopython] Access Entrez gene DB using rettype 'gb'

David Jacobs developer at allthingsprogress.com
Fri Dec 3 05:59:36 UTC 2010


Before I give an example, one note. I just realized where the "gene vs.
sequence" confusion is coming from. I'm a bacteriologist, so I don't have to
deal with introns or alternative splicing. In general, there is one
definitive sequence for a gene that I want to look at. (Am I still missing
something?)

As an example of what I'm doing: say I want to find the sequence for fliC in
Salmonella Typhi CT18. Since nucleotide is only giving me entire genomes,
I've queried the gene database instead. The query is "fliC ct18" and it
gives me one entry:

http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=fliC+ct18

Now I want the raw sequence for that gene. The sequence that shows up when I
click "FASTA" on the above page:

http://www.ncbi.nlm.nih.gov/nuccore/NC_003198?report=fasta&from=2011173&to=2012693&strand=true

How can I get that?

As far as quantity, the plan is to implement a script that will help me
analyze about 10 genes every two weeks or so.

Regards,
David

On Thu, Dec 2, 2010 at 8:15 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:

Hi, David.
>
> Perhaps you can give a concrete example.  What is the starting value (gene
> name, HUGO gene symbol, Entrez Gene ID)?  What is the expected output--you
> mention "proper nucleotide entry", but there will likely be more than one
> for a given gene?  You also mention that you are interested in a specific
> region of the genome--do you want the gene locus or the transcripts or the
> CDS, or something else?  Finally, how many genes are we talking about here?
>  5-10 or thousands?
>
> Sean
>



More information about the Biopython mailing list