[Biopython] Use biopython to create database of genome intervals?
Brad Chapman
chapmanb at 50mail.com
Wed Feb 2 15:25:25 UTC 2011
Brett;
> I'm looking to create a database of genome variants of varying size: some
> single base and some not. It needs to provide efficient range queries, such
> as "get me all genome variants in region X". Has anybody used biopython for
> something like this?
>
> I think this will require an interval tree<http://en.wikipedia.org/wiki/Interval_tree>,
I'd recommend using bx-python, which contains an excellent
IntervalTree implementation:
https://bitbucket.org/james_taylor/bx-python/wiki/Home
If you search GitHub there are several scripts you can use as
examples to get started:
https://github.com/search?langOverride=&language=python&q=intervaltree+bx&repo=&start_value=1&type=Code&x=0&y=0
But the basic usage is:
import collections
from bx.intervals.intersection import IntervalTree
# build an interval tree
itree = collections.defaultdict(IntervalTree)
for chrom, start, end, data_dict in your_intervals:
itree[chrom].insert(start, end, data_dict)
# query the tree
for chrom, start, end in regions_of_interest:
overlaps = itree[chrom].find(start, end)
Hope this helps,
Brad
More information about the Biopython
mailing list