[Biopython] Cannot make SeqFeature() comparable?

Joshua Klein mobiusklein at gmail.com
Tue Jan 31 18:50:27 UTC 2017


This sort of “multiple ways to compare the same data type” is a big part of
why the list.sort and sorted functions can take an argument key, which is a
callable which returns a surrogate value to compare, e.g. a tuple of the
fields to compare, or a proxy object whose comparison methods yield the
current ordering of interest.

Python 2 also offers a cmp optional, which is analogous to the function
passed to C’s quicksort, a callable which decides which object is larger,
and returns -1, 0, or 1 to indicate which. This was removed in Python 3,
but functools.cmp_to_key can convert a callable that behaves like cmp into
one that just returns surrogate values that satisfy its requirements,
making it compatible with the key argument to those sorting methods.

That said, sorting intervals is application specific, since it really
depends upon what you’re doing with said intervals. An Interval Tree or
Segment Tree data structure would be appropriate for fast testing for
interval overlap/point inclusion tests. For other problems, you might need
more metadata when evaluating interval queries, requiring something more
tailored. I’ve attached an Interval Tree implementation I use that might be
useful for you too.
​

On Tue, Jan 31, 2017 at 9:41 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> On Tue, Jan 31, 2017 at 2:31 PM, Chevreux, Bastien
> <bastien.chevreux at dsm.com> wrote:
> >> From: Joshua Klein [mailto:mobiusklein at gmail.com]
> >
> >> […] When assigning to the class itself, not the module, the new
> >> comparator function is called
> >
> > Yay, that worked, learning something new every day. Thanks a million.
>
> Well spotted - I missed that.
>
> > Peter: the ultimate goal of that request was to be able to call sort() on
> > features, with sometimes different and very custom sort criteria. Nothing
> > which would fit BioPython really.
>
> I think sorting of local feature locations is semi-doable, something along
> the lines of sorting by int(start), int(end), and then strand. Feature
> locations which reference another accession would be troublesome,
> which is one reason I've not pushed ahead with this idea.
>
> Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170131/a318e92f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: intervals.py
Type: application/octet-stream
Size: 14031 bytes
Desc: not available
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170131/a318e92f/attachment-0001.obj>


More information about the Biopython mailing list