[Bioperl-l] Bio::Coordinate::Collection could DoWhatIMean better (w/ patch)

Aaron Mackey ajmackey at gmail.com
Fri Sep 5 20:54:56 UTC 2008


There are two uses for Collection:

1) all the "in" seq_id's are the same, and George's patch makes sense to me
(i.e. agrees with my intuition)

2) all the "in" seq_id's are *not* the same (i.e. the collection is just a
hash of indivual pairs), in which case my query would only match the subset
of pairs having identical seq_id's to that specified by the query ... and
then you're back to case #1

So overall, it looks like this was a bug, but I'd of course want to hear
Heikki's opinion.

Thanks for raising this,

-Aaron

On Fri, Sep 5, 2008 at 3:01 PM, George Hartzell <hartzell at alerce.com> wrote:

>
> Hi all,
>
> Bio::Coordinate::Collection surprised me a bit.  At first I thought
> there was a bug, but it's clearly doing what it's supposed to.  Now
> I'm wondering if what it's supposed to be doing makes sense in some
> context, or if what I expected would be better functionality.
>
> t/CoordinateMapper.t sets up the following scenario:
>
> #
> # Collection
> #
> #         1   5     6   10
> #         |---|     |---|
> #-----|-----------------------
> #     1   5   9     15  19
> #         pair1     pair2
>
> Then goes on to do the following query:
>
>  # match more than two
>  $pos = Bio::Location::Simple->new (-start => 5, -end => 19 );
>  ok $res = $transcribe->map($pos);
>  is $res->each_gap, 2;
>  is $res->each_match, 2;
>
> I was surprised to see that there were two gaps, one gene:10-19 and
> one from gene:5-14.  Looking at the code, what's really happening is
> that, for the exon1 mapper there's match with gene:5-9 and a gap with
> gene:10-19 and for the exon2 mapper there's a gap with gene:5-14 and a
> match with gene:15-19.  All four Result's just get tossed into the
> return value.
>
> The result my intuition wants is that there are two matches (gene:5-9
> with exon1 and gene:15-19 with exon2) and a gap (gene:10-14).
>
> Yes, I guess that I could just synthesize these myself from the result
> in my app.
>
> It still seems that the current result is a bug though, since there's
> no way of knowing when you're walking through $res->each_Location that
> the first "gap" is with respect to the exon1 mapper and that the
> second "gap" is with respect to the exon2 mapper.  The gaps are
> meaningless.
>
> I "fixed" it to work the way I think it should (two matches, one
> gap).  I actually extended the test case a bit so that there's a
> multi-base gap, a match, another multibase-gap, another match, then a
> single base gap (just to make sure I got that right...).  I had to
> touch up the test file a bit to account for my new test.
>
> The gaps that I return have a strand of 'undef', which seems to be The
> Right Thing.  There's also a bit of funny business where I hang onto
> the seq_id of the gapped sequence.  It assumes that the "in" sequence
> is the same for all of the mappers.  This seems safe since otherwise
> the entire query is kind of weird....
>
> There's a patch to todays svn head at:
>
>  http://shrimp.alerce.com/bioperl/collection-diffs.txt
>
> The patch changes Build.PL to include a dependency on Set::IntSpan,
> CoordinateMapper.t to update the tests, and
> Bio/CoordinateMapper/Collection.pm for the new code.
>
> Who's code would this break.
>
> If anyone's relying on the current behaviour re: gaps, what's the
> situation in which you find it useful?
>
> Thanks!
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list