[Bioperl-l] How to retrieve the unigene number
Andrew Macgregor
andrew@anatomy.otago.ac.nz
Thu, 09 May 2002 09:36:09 +1200
Giuseppe Torelli wrote:
> I'm a newbie regarding Bio Perl and also Perl. I've followed the discussion
> about the unigene module;
> would you please tell me how to retrieve the unigene number of a gene
> knowing the GenBank
> accession number ?
Hi Giuseppe,
I'm sure there is "more than one way to do it" but here are a couple. Which
you use depends on how often you want to look up a unigene from an accession
number.
1. If it is just once or twice, I would just visit the UniGene website,
select the appropriate organism, type in the acc number and hit search.
2. If you wanted to do lots of this I would:
- download the appropriate organism file from UniGene ie. Hs.data
- use the ClusterIO and Unigene modules to parse the file (this takes some
time, I leave mine overnight)
- drop the resulting data into a SQL db
- search on acc number to get back to the UniGene no
- the ClusterIO/Unigene modules can parse right into the seq lines so you
can pull out all the acc numbers for each unigene.
- you could then write a perl script that did the retrieval for all the acc
numbers you are interested in (especially say if there are thousands of
them).
I hope this helps...
Cheers, Andrew.