[Bioperl-l] How to retrieve the unigene number

Andrew Macgregor andrew@anatomy.otago.ac.nz
Thu, 09 May 2002 09:36:09 +1200


Giuseppe Torelli wrote:

> I'm a newbie regarding Bio Perl and also Perl. I've followed the discussion
> about the unigene module;
> would you please tell me how to retrieve the unigene number of a gene
> knowing the GenBank
> accession number ?

Hi Giuseppe,

I'm sure there is "more than one way to do it" but here are a couple. Which
you use depends on how often you want to look up a unigene from an accession
number.

1. If it is just once or twice, I would just visit the UniGene website,
select the appropriate organism, type in the acc number and hit search.

2. If you wanted to do lots of this I would:
- download the appropriate organism file from UniGene ie. Hs.data
-  use the ClusterIO and Unigene modules to parse the file (this takes some
time, I leave mine overnight)
- drop the resulting data into a SQL db
- search on acc number to get back to the UniGene no
- the ClusterIO/Unigene modules can parse right into the seq lines so you
can pull out all the acc numbers for each unigene.
- you could then write a perl script that did the retrieval for all the acc
numbers you are interested in (especially say if there are thousands of
them).

I hope this helps...

Cheers, Andrew.