[Biopython] how to find closest genes for a given location

Brad Chapman chapmanb at 50mail.com
Thu Feb 25 13:34:31 UTC 2010


Hi Sameet;

> I have multiple locations from human genomes.  I want to determine
> what are the closest genes on either side of the location, and if it
> is in the location how far from the TSS the given location is.  I was
> thinking of using the CCDS database, because it contains information
> for the genes that have been verified.  Is there any other
> better/smarter way of doing it.

I don't know of a ready to go library in Python that does this, but
you could put something together using the Interval intersection
library in bx-python:

http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/intervals/intersection.pyx

You would build up an interval tree of gene features from someplace
like CCDS, and then loop through your BED file and intersect with
the tree. For finding closest non-overlapping genes, look at
upstream_of_interval and downstream_of_interval.

For a non-python approach the ChIPpeakAnno R package in Bioconductor
provides a library that does what you are looking for:

http://bioconductor.org/packages/2.5/bioc/html/ChIPpeakAnno.html

rpy2 is an excellent gateway to R from Python:

http://rpy.sourceforge.net/rpy2.html

Hope this helps,
Brad



More information about the Biopython mailing list