[Biopython] how to find closest genes for a given location

Peter biopython at maubp.freeserve.co.uk
Thu Feb 25 13:37:40 UTC 2010


On Thu, Feb 25, 2010 at 1:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Sameet;
>
>> I have multiple locations from human genomes.  I want to determine
>> what are the closest genes on either side of the location, and if it
>> is in the location how far from the TSS the given location is.  I was
>> thinking of using the CCDS database, because it contains information
>> for the genes that have been verified.  Is there any other
>> better/smarter way of doing it.
>
> I don't know of a ready to go library in Python that does this, but
> you could put something together using the Interval intersection
> library in bx-python:
>
> http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/intervals/intersection.pyx
>
> You would build up an interval tree of gene features from someplace
> like CCDS, and then loop through your BED file and intersect with
> the tree. For finding closest non-overlapping genes, look at
> upstream_of_interval and downstream_of_interval.

Or, if you don't have too many locations to deal with, a simple brute
force approach looping over the features to find the closest ones
would work just fine. How many is "multiple locations"?

Peter




More information about the Biopython mailing list