[Biopython] Cannot make SeqFeature() comparable?

Joshua Klein mobiusklein at gmail.com
Tue Jan 31 19:51:04 UTC 2017


You can use a key function which returns a tuple value instead of a scalar,
and just ensure the values in the key tuple meet your needs for breaking
ties. Tuples are ordered sequentially by their first element. If the first
elements between two cases are equal, they compare the second, then the
third, and so on.


feature_priority_map = {
    "gene": 1,
    "regulatory": 2,
    "mRNA": 3,
    # fill in the rest here
}
def comparison_key(feature):
    return (feature.start, feature_priority_map[feature.type], )

This key function will return a tuple containing the start position of the
feature, and a priority number mapped from your list of feature types and
the order they should appear in, since it sounds like lexicographic sorting
is not what you want. You could add a third value to the tuple to address
your third requirement, if it is concrete enough.

The resulting tuple for a gene at position x is (x, 1). The tuple for an
mRNA at position x is (x, 3). (x, 1) < (x, 3), so (x, 1) will come before
(x, 3). This maps to the gene feature coming before the mRNA feature
despite their starting at the same position in the genome.

Just pass the comparison_key function as the key parameter to sort with
these criteria.
​

On Tue, Jan 31, 2017 at 2:43 PM, Chevreux, Bastien <bastien.chevreux at dsm.com
> wrote:

> Which leaves me no option than to explain the whole sorting logic and not
> just the subset I did to keep my basic problem easily solvable on this list
> :-)
>
>
>
> I want/need, for GenBank output, have the usual “interleaved” sorting of
> features, i.e.,
>
> 1.       First by start position
>
> 2.       On equal start, sort by type (first “gene”, then “regulatory”,
> then “mRNA”, “CDS”, etc.pp)
>
> 3.       Maybe on equal start and type, sort by feature attributes
> (locus_tag, name etc.)
>
> (Maybe 2 & 3 need to be inversed in sorting logic, but that question is
> for another day)
>
>
>
> I had considered using key and attrgetter, but these are not flexible
> enough for the above I think, are they?
>
>
>
> Currently I do not see a way other than temporary monkey patching for this
> but would be happy to hear about one.
>
>
>
> Best,
>
>   Bastien
>
>
>
> --
> DSM Nutritional Products Microbia Inc | Bioinformatics
> 60 Westview Street | Lexington, MA 02421 | United States
> Phone +1 781 259 7613 <(781)%20259-7613> | Fax +1 781 259 0615
> <(781)%20259-0615>
>
>
>
> *From:* lenna.peterson at gmail.com [mailto:lenna.peterson at gmail.com] *On
> Behalf Of *Lenna Peterson
> *Sent:* Tuesday, January 31, 2017 2:13 PM
> *To:* Chevreux, Bastien <bastien.chevreux at dsm.com>
> *Cc:* Joshua Klein <mobiusklein at gmail.com>; Peter Cock <
> p.j.a.cock at googlemail.com>; biopython at biopython.org
>
> *Subject:* Re: [Biopython] Cannot make SeqFeature() comparable?
>
>
>
> --- This mail has been sent from an external source ---
>
> re: Joshua's post about the key argument, here is an example of sorting
> SeqFeatures by location start without (potentially error-prone) monkey
> patching:
>
>
>
> import operator
>
> sorted_features = sorted([f1, f2], key=operator.attrgetter("
> location.start"))
>
>
>
> https://docs.python.org/2/library/operator.html#operator.attrgetter
>
>
>
> Cheers,
>
>
>
> Lenna
>
>
>
> On Tue, Jan 31, 2017 at 9:31 AM, Chevreux, Bastien <
> bastien.chevreux at dsm.com> wrote:
>
> > From: Joshua Klein [mailto:mobiusklein at gmail.com]
>
> > […] When assigning to the class itself, not the module, the new
>
> > comparator function is called
>
>
>
> Yay, that worked, learning something new every day. Thanks a million.
>
>
>
> Peter: the ultimate goal of that request was to be able to call sort() on
> features, with sometimes different and very custom sort criteria. Nothing
> which would fit BioPython really.
>
>
>
> Best,
>
>   Bastien
>
>
>
> --
> DSM Nutritional Products Microbia Inc | Bioinformatics
> 60 Westview Street | Lexington, MA 02421 | United States
> Phone +1 781 259 7613 <(781)%20259-7613> | Fax +1 781 259 0615
> <(781)%20259-0615>
>
>
>
> *From:* Joshua Klein [mailto:mobiusklein at gmail.com]
> *Sent:* Tuesday, January 31, 2017 7:48 AM
> *To:* Peter Cock <p.j.a.cock at googlemail.com>
> *Cc:* Chevreux, Bastien <bastien.chevreux at dsm.com>;
> biopython at biopython.org
> *Subject:* Re: [Biopython] Cannot make SeqFeature() comparable?
>
>
>
> --- This mail has been sent from an external source ---
>
> The reason the original code snippet doesn’t seem to be working as
> expected is that the cmp1 function is assigned to the __lt__ attribute of
> the SeqFeature module, not the SeqFeature class, which is located at
> SeqFeature.SeqFeature. When assigning to the class itself, not the
> module, the new comparator function is called.
>
> This sort of patching works differently for old-style and new-style
> classes, having to do with how special methods are looked up. Old style
> classes look up special methods on the instance, new style classes look
> them up on the instance’s class.
>
>>
>
>
> On Tue, Jan 31, 2017 at 4:07 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>
> Hi Bastien,
>
> I'm not immediately sure if "monkey patching" the class
> methods at run time like that would work in principle.
> If you insert a print into it, it does not seem to be invoked.
>
> It might be worth trying a modified Biopython, or an
> explicit subclass to narrow down where this breaks.
>
> Or more simply, can you just do the start position
> comparison explicitly if that's what you want to use?
>
> f1.location.start < f2.location.start
>
> Peter
>
>
>
> On Mon, Jan 30, 2017 at 11:05 PM, Chevreux, Bastien
> <bastien.chevreux at dsm.com> wrote:
> > Hi there,
> >
> >
> >
> > I have a problem making the SeqFeature() class comparable by providing a
> > __lt__ function. Consider the following:
> >
> >
> >
> > ------------------------------------------------------------------
> >
> > #!/usr/bin/env python3
> >
> >
> >
> > from Bio import SeqFeature
> >
> >
> >
> > def cmp1(this,other):
> >
> >     return int(this.location.start) < int(other.location.start);
> >
> >
> >
> > SeqFeature.__lt__=cmp1;
> >
> > f1 = SeqFeature.SeqFeature(SeqFeature.FeatureLocation(10, 200));
> >
> > f2 = SeqFeature.SeqFeature(SeqFeature.FeatureLocation(1000, 1200));
> >
> >
> >
> > if f1<f2:
> >
> >     print("f1<f2");
> >
> > else:
> >
> >     print("nope, f1>=f2");
> >
> > ------------------------------------------------------------------
> >
> >
> >
> > The code above runs with an error message:
> >
> >     if f1<f2:
> >
> > TypeError: unorderable types: SeqFeature() < SeqFeature()
> >
> >
> >
> > What I do not understand is that this should be the canonical recipe for
> > making any class comparable via LT operator. Compare to the following
> code
> > which runs without problems:
> >
> >
> >
> > ------------------------------------------------------------------
> >
> > #!/usr/bin/env python3
> >
> >
> >
> > class myclass():
> >
> >     def __init__(self, value):
> >
> >         self.bla=value;
> >
> >
> >
> > def cmp2(this,other):
> >
> >     return this.bla < other.bla;
> >
> >
> >
> > myclass.__lt__=cmp2;
> >
> > m1=myclass(1);
> >
> > m2=myclass(2);
> >
> >
> >
> > if m1<m2:
> >
> >     print("m1<m2");
> >
> > else:
> >
> >     print("nope, m1>=m2");
> >
> > ------------------------------------------------------------------
> >
> >
> >
> > What am I missing?
> >
> >
> >
> > Best,
> >
> >   Bastien
> >
> >
> >
> > --
> > DSM Nutritional Products Microbia Inc | Bioinformatics
> > 60 Westview Street | Lexington, MA 02421 | United States
> > Phone +1 781 259 7613 <(781)%20259-7613> | Fax +1 781 259 0615
> <(781)%20259-0615>
> >
> >
> >
> >
> > ________________________________
> >
> > DISCLAIMER:
> > This e-mail is for the intended recipient only.
> > If you have received it by mistake please let us know by reply and then
> > delete it from your system; access, disclosure, copying, distribution or
> > reliance on any of it by anyone else is prohibited.
> > If you as intended recipient have received this e-mail incorrectly,
> please
> > notify the sender (via e-mail) immediately.
> >
>
> > _______________________________________________
> > Biopython mailing list  -  Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
>
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170131/d24f5a4e/attachment-0001.html>


More information about the Biopython mailing list