[Bioperl-l] Getting genomic coordinates for a list of genes

Emanuele Osimo e.osimo at gmail.com
Thu Jul 23 23:24:24 UTC 2009


Hello everyone.
Today I discovered that the coupling of the two subs that Mark posted
doesn't get the right results. I think this is because one gets the
coordinates with RefSeq build 36.3, the other with build 37.
I found that coupling the first sub, genome_coords, with the
Bio::EnsEMBL::Registry fetch by region API is a lot better, and it actually
generates sequences that contain the genes.
Bye
Emanuele

P.S.
Thanks a lot to Mark!!


On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen <maj at fortinbras.us> wrote:

> Sorry, went off-list for a couple cycles. The final product will get the
> correct chromosomal coordinates and then return the sequence from
> the current build, based on a geneID input. See
> http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence
> for the results.
> cheers MAJ
> ----- Original Message ----- From: "Emanuele Osimo" <e.osimo at gmail.com>
> To: "perl bioperl ml" <bioperl-l at lists.open-bio.org>
> Sent: Friday, July 17, 2009 8:49 AM
> Subject: [Bioperl-l] Getting genomic coordinates for a list of genes
>
>
>  Hello everyone,
>> I'm new to programming, I'm a biologist, so please forgive my ignorance,
>> but
>> I've been trying this for 2 weeks, now I have to ask you.
>> I'm trying the script I found at
>>
>> http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates
>> because I need to have some variables (like $from and $to) assigned to the
>> start and end of a gene.
>> The script works fine, but gives me the wrong coordinates: for example if
>> I
>> try it with the gene  842 (CASP9), it prints:
>> NT_004610.19    2498878    2530877
>>
>> I found out that in Entrez, for each gene (for CASP9, for example, at
>>
>> http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq
>> ) under "Genome Reference Consortium Human Build 37 (GRCh37),
>> Primary_Assembly" there are two different sets of coordinates. The first
>> is
>> called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37),
>> Primary_Assembly", and is the one I need, and the second one is called
>> just
>> "NT_004610.19" and it's the one that the script prints.
>> This is valid for all the genes I tried.
>>
>> DO you know how to make the script print the "right" coordinates (at
>> least,
>> the one I need)?
>> Thanks a lot in advance,
>> Emanuele
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>



More information about the Bioperl-l mailing list