[Biopython] how to obtain official Gene Symbols for a list of GeneNames

Sameet Mehta msameet at gmail.com
Fri Jan 8 17:13:24 UTC 2010


Thanks Peter,
that is something i was looking for.

thanks for the help.

regards
Sameet

On Fri, Jan 8, 2010 at 10:18 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Please CC the mailing list.
>
> On Fri, Jan 8, 2010 at 4:09 PM, Sameet Mehta <msameet at gmail.com> wrote:
>> Hi,
>> My list contains gene names such as DKFZP586P0123 , RPL6, etc.  What I
>> do is search this in the NCBI Gene database manually, and then i get
>> the official Gene Symbol.  I want to automate this process.  I am of
>> course interested only in official gene symbols from the Humans.
>>
>> Sameet
>
> OK, so via my browser using Entrez Gene, I used:
>
> DKFZP586P0123 "Homo sapiens"[orgn]
>
> This maps uniquely to C2CD3. However,
>
> RPL6 "Homo sapiens"[orgn]
>
> maps to several hits (some discontinued) included things like
> RPL6P13. Clearly we need to make the search a little more
> specific... we only want to search for a name or gene symbol
> (not the default search on all fields).
>
> It looks like searching on "gene" works nicely, see also:
> http://news.open-bio.org/news/2009/06/ncbi-einfo-biopython/
>
> Entrez queries like these seem to give unique matches:
>
> DKFZP586P0123[gene] "Homo sapiens"[orgn]
> RPL6[gene] "Homo sapiens"[orgn]
>
> e.g.
>
>>>> from Bio import Entrez
>>>> Entrez.email = "Your.Name.Here at example.com"
>>>> search = Entrez.read(Entrez.esearch(db='gene', term='DKFZP586P0123[gene] "Homo sapiens"[orgn]', retmode='xml'))
>>>> print search["IdList"]
> ['26005']
>
> That unique ID we got back (26005) is the UID for this gene, which
> you should be able to use with EFetch (or Elink?). e.g. You could
> download the whole record as XML, and parse that:
>
>>>> result = Entrez.read(Entrez.efetch(db='gene', id='26005', retmode='xml'))
>>>> result[0]['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']
> 'C2CD3'
>
> However, this next approach is a much quicker download, and so
> looks like a more efficient way to get the desired gene symbol:
>
>>>> print Entrez.efetch(db='gene', id='26005', retmode='text', rettype='brief').read()
>
> 1: C2CD3 C2 calcium-depend... [GeneID: 26005]
>
> Next read the Entrez chapter in the Biopython Tutorial, especially
> the bit about the history functionality for linking ESearch and EFetch.
>
> Peter
>



-- 
Sameet Mehta, Ph.D.,
Research Associate,
Chromatin Biology Laboratory,
National Centre for Cell Science,
NCCS Complex,
University of Pune Campus,
Pune 411007

Phone: +91-20-25708158
Other Email: sameet at nccs.res.in




More information about the Biopython mailing list