[Bioperl-l] problem to fit genomic coordinates

Thu Mar 26 18:07:50 UTC 2009

On Mar 26, 2009, at 11:25 AM, Sean Davis wrote:

> On Thu, Mar 26, 2009 at 11:30 AM, Laurent MANCHON
> <lmanchon at univ-montp2.fr>wrote:
>
>> okay, you are right,
>> but i think in my opinion that my question is a good question about  
>> parsing
>> enormous
>> range of intervals.
>> The problem is not perl, bioperl, or other language, it's just an
>> algorithmic question.

This problem has already been solved to a great degree within BioPerl,  
and to a great many users satisfaction (see the Gbrowse list for a  
larger group of users).  No need to reinvent the wheel, just optimize  
it.  As you've indicated, if you need a non-toolkit, from-scratch  
solution you should follow Sean's suggestion (binning and R-tree).

>> I'm not a professionnal in Bioperl and i don't know what is  
>> possible to do
>> with all the Bioperl modules.
>> So if you think it's possible to resolve my question with Bioperl  
>> maybe you
>> are right, but in my position i stay in the same point.
>> If you want i send you the two files needed in my question. And if  
>> you are
>> agree try to use Bioperl to resolve it. Maybe it's not possible  
>> because
>> files are big, i don't know.

*sigh*

Laurent, that's not it.  Obviously this isn't quite sinking in, so  
I'll give it one last shot then I'm done.  It isn't our job to do your  
work for you, homework or otherwise.  We made our suggestions, and  
(given the situation) we even sometimes write up some demo code, but  
it's up to you to do the work.  Particularly seeing as it's a homework  
problem.

Let's say Scott's right and your instructor is on this list.  Judging  
by one of your previous responses ('not to use BioPerl'), your  
instructor may very well hang out here. Also, remember this is a  
public mail list, archived and searchable via any web engine:

http://bioperl.org/pipermail/bioperl-l/2009-March/029626.html

Good luck with that.

> To answer the question a bit more directly, you might consider using  
> an
> R-tree indexing scheme or something akin to the binning scheme that  
> UCSC
> uses for range queries.

Binning is what Bio::SeqFeature::Collection,  
Bio::DB::SeqFeature::Store and the like do (hence my suggestion).   
It's quite fast.  I'm not sure if we have an R-tree implementation or  
not, might be worth looking into.

> Algorithms like those allow very fast range
> operations.  If you want a sense of how fast, try using the galaxy  
> server at
> Penn State.
>
> Sean

Another good option.

chris