[Biopython] How to efetch Unigene records? Is it possible at all?

Brad Chapman chapmanb at 50mail.com
Thu Jul 30 18:09:02 EDT 2009


Hi Carlos;

> I have the task to from a list of human genes of interest, grab their
> protein counter parts in the database to do some additional work.
[...]
> >>> from Bio import Entrez
> >>> from Bio import UniGene
> >>> Entrez.email = "carlos.borroto at gmail.com"
> >>> handle = Entrez.esearch(db="unigene", term="Hs.94542")
> >>> record = Entrez.read(handle)
> >>> record
> {u'Count': '1', u'RetMax': '1', u'IdList': ['141673'],
> u'TranslationStack': [{u'Count': '1', u'Field': 'All Fields', u'Term':
> 'Hs.94542[All Fields]', u'Explode': 'Y'}, 'GROUP'], u'TranslationSet':
> [], u'RetStart': '0', u'QueryTranslation': 'Hs.94542[All Fields]'}
> >>> handle = Entrez.efetch(db="unigene", id="Hs.94542")
> >>> print handle.read()
> 
> This print like a webpage, I assume is NCBI server giving an error response.
> 
> So there is something I could do to accomplish what I want, either
> through parsing the Genebank files or fetching the Unigene and then
> parsing its?

It looks like you are doing things correctly, but I'm not sure if
NCBI supports retrieving UniGene records through the efetch
interface. I tried playing around with it for a bit and got the same
problems as you; the documentation on their site is also not very
clear about if unigene is supported and what return types to get.
Not having a lot of experience with UniGene, my guess is this isn't
the right direction to go.

My suggestion to get your work done is to download the *.data files
from the ftp site:

ftp://ftp.ncbi.nih.gov/repository/UniGene/

and write a script that runs through these and pulls out the protein
identifiers of interest. You should be able to use the UniGene
parser for this and use the protsim attribute of each record. With
these, you can get the GI number (protgi attribute) and use this to
fetch the relevant GenBank records through Entrez.

Hope this helps,
Brad


More information about the Biopython mailing list