[Biopython] Access Entrez gene DB using rettype 'gb'

David Jacobs developer at allthingsprogress.com
Thu Dec 2 21:42:19 UTC 2010


Hi Sean,

Thanks for the info. I didn't realize the gene database wasn't concerned
with sequences. (The distinction isn't so clear when you're using the web
interface.) So now I'm trying to query nucleotide. My scripting approach has
been:

1. Get list of gene names from a file
2. Query nucleotide for gene ID
3. Use that gene ID to download the proper nucleotide entry

However, every time I get an ID from nucleotide, it's for an entire genome.
How can I specify either a) a specific gene (as identified in the gene
database) or b) a specific region of the genome?

David

On Thu, Dec 2, 2010 at 4:07 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>
> Hi, David.
>
> Genes (in the sense used in Entrez Gene) do not have sequences.  Their
> respective transcripts do, however, and there can be, in general, multiple
> transcripts per gene.  Therefore, I think you would have to do a query for
> the gene of interest and then link to nucleotide to get the sequences for
> the associated transcripts.  If you want to do this for many genes, it may
> be easier to download the entire refseq collection for your species of
> interest and simply load stuff into memory or index the fasta file.
>
> Sean
>
>



More information about the Biopython mailing list