[Biopython] how to find closest genes for a given location

Chris Fields cjfields at illinois.edu
Thu Feb 25 14:26:43 UTC 2010


On Feb 25, 2010, at 7:37 AM, Peter wrote:

> On Thu, Feb 25, 2010 at 1:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> Hi Sameet;
>> 
>>> I have multiple locations from human genomes.  I want to determine
>>> what are the closest genes on either side of the location, and if it
>>> is in the location how far from the TSS the given location is.  I was
>>> thinking of using the CCDS database, because it contains information
>>> for the genes that have been verified.  Is there any other
>>> better/smarter way of doing it.
>> 
>> I don't know of a ready to go library in Python that does this, but
>> you could put something together using the Interval intersection
>> library in bx-python:
>> 
>> http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/intervals/intersection.pyx
>> 
>> You would build up an interval tree of gene features from someplace
>> like CCDS, and then loop through your BED file and intersect with
>> the tree. For finding closest non-overlapping genes, look at
>> upstream_of_interval and downstream_of_interval.
> 
> Or, if you don't have too many locations to deal with, a simple brute
> force approach looping over the features to find the closest ones
> would work just fine. How many is "multiple locations"?
> 
> Peter

Maybe BEDTools would be generally useful here?  

http://code.google.com/p/bedtools/

chris



More information about the Biopython mailing list