[Biopython] how to find closest genes for a given location

Sean Davis sdavis2 at mail.nih.gov
Thu Feb 25 14:01:09 UTC 2010


On Thu, Feb 25, 2010 at 8:34 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Sameet;
>
>> I have multiple locations from human genomes.  I want to determine
>> what are the closest genes on either side of the location, and if it
>> is in the location how far from the TSS the given location is.  I was
>> thinking of using the CCDS database, because it contains information
>> for the genes that have been verified.  Is there any other
>> better/smarter way of doing it.
>
> I don't know of a ready to go library in Python that does this, but
> you could put something together using the Interval intersection
> library in bx-python:
>
> http://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/intervals/intersection.pyx

Or you could use the Galaxy web server at Penn State, which uses
bx-python for infrastructure.  From memory, I believe that Galaxy has
a "find nearest feature" tool.

Sean

> You would build up an interval tree of gene features from someplace
> like CCDS, and then loop through your BED file and intersect with
> the tree. For finding closest non-overlapping genes, look at
> upstream_of_interval and downstream_of_interval.
>
> For a non-python approach the ChIPpeakAnno R package in Bioconductor
> provides a library that does what you are looking for:
>
> http://bioconductor.org/packages/2.5/bioc/html/ChIPpeakAnno.html
>
> rpy2 is an excellent gateway to R from Python:
>
> http://rpy.sourceforge.net/rpy2.html
>
> Hope this helps,
> Brad
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>




More information about the Biopython mailing list