[Bioperl-l] GenericHit->start/end needs tiled hsps?

Steve Chervitz sac at bioperl.org
Thu Apr 19 05:14:02 UTC 2007


Sendu,

Your thinking here seems correct and in fact agrees with the documentation
for those methods:

start():  If there is more than one HSP, the lowest start
           value of all HSPs is returned.

end():  If there is more than one HSP, the largest end
          value of all HSPs is returned.

It would be fine with me to change the implementation in GenericHit as you
suggest and to not tile the HSPs. Tiling is only necessary for data that is
summed across the region covered by all HSPs, as is done by these methods:
matches(), gaps(), frac_* and percent_*.

Steve

On 4/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Hi all,
>
> I want to double-check my thinking regarding
> Bio::Search::Hit::GenericHit->start() and end(). Right now the docs
> claim that hsps of the hit object must be tiled before the answer can be
> produced. The code is implemented in that way
> (Bio::Search::SearchUtils::tile_hsps($self)).
>
> Yet as far as I can see, all you need to do is loop through all hsps and
> pick out the smallest start and largest end respectively in terms of
> subject and query.
>
> This comes up because I have a blast report where a single hit contains
> over 80000 hsps and the tiling takes over an hour (I gave up on it,
> don't know how long it really takes). The simple loop through hsps takes
> seconds or less.
>
> Now in this situation the answer isn't especially useful (to me). An
> alternative way of fixing the problem would be to re-write the tiling
> algorithm (again) to somehow make it hundreds of times faster, then
> provide some way in start() and end() for the user to request the start
> and end of the best contig, or other contig of choice. Easier said than
> done though!
>
>
> What do people think?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list