[Biopython] Converting transcript coordinates to genome coordinates

Fri Jul 7 05:04:05 UTC 2017

The hgvs package does some of what you seek for sequence variants written
using HGVS nomenclature standards. In this context, perhaps the most
important feature to point out is that it correctly handles cases where the
genome-transcript alignment contains indels.

See https://github.com/biocommons/hgvs for features, installation, and
examples, and http://hgvs.readthedocs.io/en/latest/ for extended
documentation.

If hgvs doesn't do what you want, you may want to at least use the
exon-wise alignments in UTA. All coordinates are from the source databases
(NCBI, UCSC, Ensembl); the alignment here is only a n-w alignment of the
exons to generate a cigar string. https://github.com/biocommons/uta, and
publicly visible at uta.biocommons.org:5432 (postgresql).

-Reece

On Wed, Jul 5, 2017 at 3:04 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Dear all,
>
> Does anybody have some code to convert transcript coordinates to genome
> coordinates?
> I have the position of a nucleotide along a transcript, and the genome
> coordinates of the start and end of each exon in the transcript, and I
> would like to find the position of the nucleotide in genome coordinates.
> Ideally, I am looking for some code that can find the genome coordinates
> of a sequence of nucleotides.
> For example, if these are the exons:
> exon1  10000 10030
> exon2  10050 10080
> and a nucleotide sequence starting at position 20 and ending at position
> 50 in transcript coordinates,
> then I am looking for the genome coordinates (10020,10030), (10050,10070).
>
> Thanks,
> -Michiel
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170706/55ff2e1b/attachment.html>