[Bioperl-l] problem to fit genomic coordinates
Chris Fields
cjfields at illinois.edu
Thu Mar 26 14:07:50 EDT 2009
On Mar 26, 2009, at 11:25 AM, Sean Davis wrote:
> On Thu, Mar 26, 2009 at 11:30 AM, Laurent MANCHON
> <lmanchon at univ-montp2.fr>wrote:
>
>> okay, you are right,
>> but i think in my opinion that my question is a good question about
>> parsing
>> enormous
>> range of intervals.
>> The problem is not perl, bioperl, or other language, it's just an
>> algorithmic question.
This problem has already been solved to a great degree within BioPerl,
and to a great many users satisfaction (see the Gbrowse list for a
larger group of users). No need to reinvent the wheel, just optimize
it. As you've indicated, if you need a non-toolkit, from-scratch
solution you should follow Sean's suggestion (binning and R-tree).
>> I'm not a professionnal in Bioperl and i don't know what is
>> possible to do
>> with all the Bioperl modules.
>> So if you think it's possible to resolve my question with Bioperl
>> maybe you
>> are right, but in my position i stay in the same point.
>> If you want i send you the two files needed in my question. And if
>> you are
>> agree try to use Bioperl to resolve it. Maybe it's not possible
>> because
>> files are big, i don't know.
*sigh*
Laurent, that's not it. Obviously this isn't quite sinking in, so
I'll give it one last shot then I'm done. It isn't our job to do your
work for you, homework or otherwise. We made our suggestions, and
(given the situation) we even sometimes write up some demo code, but
it's up to you to do the work. Particularly seeing as it's a homework
problem.
Let's say Scott's right and your instructor is on this list. Judging
by one of your previous responses ('not to use BioPerl'), your
instructor may very well hang out here. Also, remember this is a
public mail list, archived and searchable via any web engine:
http://bioperl.org/pipermail/bioperl-l/2009-March/029626.html
Good luck with that.
> To answer the question a bit more directly, you might consider using
> an
> R-tree indexing scheme or something akin to the binning scheme that
> UCSC
> uses for range queries.
Binning is what Bio::SeqFeature::Collection,
Bio::DB::SeqFeature::Store and the like do (hence my suggestion).
It's quite fast. I'm not sure if we have an R-tree implementation or
not, might be worth looking into.
> Algorithms like those allow very fast range
> operations. If you want a sense of how fast, try using the galaxy
> server at
> Penn State.
>
> Sean
Another good option.
chris
More information about the Bioperl-l
mailing list