[Bioperl-l] Getting genomic coordinates for a list of genes

Emanuele Osimo e.osimo at gmail.com
Fri Jul 17 08:49:36 EDT 2009


Hello everyone,
I'm new to programming, I'm a biologist, so please forgive my ignorance, but
I've been trying this for 2 weeks, now I have to ask you.
I'm trying the script I found at
http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates
because I need to have some variables (like $from and $to) assigned to the
start and end of a gene.
The script works fine, but gives me the wrong coordinates: for example if I
try it with the gene  842 (CASP9), it prints:
NT_004610.19    2498878    2530877

I found out that in Entrez, for each gene (for CASP9, for example, at
http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq
) under "Genome Reference Consortium Human Build 37 (GRCh37),
Primary_Assembly" there are two different sets of coordinates. The first is
called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37),
Primary_Assembly", and is the one I need, and the second one is called just
"NT_004610.19" and it's the one that the script prints.
This is valid for all the genes I tried.

DO you know how to make the script print the "right" coordinates (at least,
the one I need)?
Thanks a lot in advance,
Emanuele



More information about the Bioperl-l mailing list