[Biopython] How to get sequences upstream of TSS of genes?

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Fri Oct 16 08:29:46 UTC 2009


On Thu, Oct 15, 2009 at 11:17 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> I have a set of genes. I want to get the 5kb sequence that is upstream
> of the TSS's of each gene.

You can do that with biomart:
- http://www.ensembl.org/biomart/martview/a90f00892a48e04d438f762f551bf48a/a90f00892a48e04d438f762f551bf48a

select Ensembl56 as database, Mus Musculus as species, go to Filters
and fill the 'Id list limit' form to add the required geneIds, then go
to Attributes, select Sequences and then check 'Upstream Flank -
5000'.

As for doing that in python, I am not sure there are python interfaces
to BioMart. Galaxy (http://main.g2.bx.psu.edu/) is written in python,
so they must have written a library for that somewhere, but I don't
know their code.

If you use R (remember that you can mix python and R with rpy2) there
is a nice module in bioconductor called BioMart.


> I have the following specific questions. Could somebody help me? Thank you!
>
> Which database I can access to get mouse genome?
> Give a gene name what function I should call to get the gene's location?
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



-- 
Giovanni Dall'Olio, phd student
Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it




More information about the Biopython mailing list