[Biopython] Use biopython to create database of genome intervals?

Brad Chapman chapmanb at 50mail.com
Wed Feb 2 15:25:25 UTC 2011


Brett;

> I'm looking to create a database of genome variants of varying size: some
> single base and some not. It needs to provide efficient range queries, such
> as "get me all genome variants in region X". Has anybody used biopython for
> something like this?
> 
> I think this will require an interval tree<http://en.wikipedia.org/wiki/Interval_tree>,

I'd recommend using bx-python, which contains an excellent
IntervalTree implementation:

https://bitbucket.org/james_taylor/bx-python/wiki/Home

If you search GitHub there are several scripts you can use as
examples to get started:

https://github.com/search?langOverride=&language=python&q=intervaltree+bx&repo=&start_value=1&type=Code&x=0&y=0

But the basic usage is:

import collections
from bx.intervals.intersection import IntervalTree

# build an interval tree
itree = collections.defaultdict(IntervalTree)
for chrom, start, end, data_dict in your_intervals:
    itree[chrom].insert(start, end, data_dict)

# query the tree
for chrom, start, end in regions_of_interest:
    overlaps = itree[chrom].find(start, end)

Hope this helps,
Brad



More information about the Biopython mailing list