[Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?

Rainer Machne raim at tbi.univie.ac.at
Wed Jan 31 21:09:49 UTC 2007


Dear Bioperl list,

hoping not be on the wrong email list, i would have a short question:

Is there a standard way or are there nice (Bioperl) tools to come from a 
gene id (gi) other ids (see below) to the genomic coordinates of the 
respective gene?

We have Fasta files retrieved from NCBI protein Blast in fungal genomes:

 >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago 
maydis 521]
or
 >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata]

(we only have gi, ref and gb in my set).

I retrieved all my fasta files from whole fungal genomes with available 
protein sequences at
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi

As I only searched whole finished genomes (not shotgun), I thought it 
would then be easy to get the genomic coordinates and retrieve upstream 
sequences, but we have failed so far to find a consistent way to do this 
automatically. Many of the gi entries refer to mRNAs or partial mRNAs 
and the way to the coordinates seems to differ for each case.

Any suggestions would be appreciated.

with kind regards,
Rainer Machne

University of Vienna
Department for Theoretical Chemistry
Theoretical Biochemistry Group



More information about the Bioperl-l mailing list