[Biopython] How to get sequences upstream of TSS of genes?

Peter biopython at maubp.freeserve.co.uk
Thu Oct 15 21:42:41 UTC 2009


On Thu, Oct 15, 2009 at 10:17 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> I have a set of genes. I want to get the 5kb sequence that is upstream
> of the TSS's of each gene.
>
> I have the following specific questions. Could somebody help me? Thank you!
>
> Which database I can access to get mouse genome?
> Give a gene name what function I should call to get the gene's location?

I am not familiar with mouse specific databases.

My first instinct would be to download the GenBank files for
all the mouse chromosomes via FTP from the NCBI. You
can parse these with Biopython, and pull out the gene of
interest. Then using the gene's strand and the start/end
location, you can deduce the coordinates to the upstream
region, and take this section from the chromosome sequence
(and reverse complement if on the reverse strand).

Peter



More information about the Biopython mailing list