[Biopython-dev] [biopython-dev] SeqFeature comparison for equality

Peter Cock p.j.a.cock at googlemail.com
Mon Oct 17 10:10:54 UTC 2011


Hi Joshua and everyone,

It looks like Joshua's email (below) got lost in the spam filter (possibly
due to the attachment). The core of his patch was as follows (there
were also lots of white space changes).


@@ -694,6 +714,15 @@ class FeatureLocation(object):
             for i in range(self._start, self._end):
                 yield i

+    def __eq__(self, other):
+        """Compares a FeatureLocation for equality"""
+        if not isinstance(other, FeatureLocation):
+            return False
+        if self.start() == other.start() and \
+                self.end() == other.end():
+            return True
+        return False
+



@@ -255,6 +255,26 @@ class SeqFeature(object):
             qualifiers = dict(self.qualifiers.iteritems()),
             sub_features = [f._flip(length) for f in self.sub_features[::-1]])

+    def __eq__(self, other):
+        """Compare between this SeqFeature and other.
+
+        ref, ref_db and qualifiers are not needed for comparison"""
+        if not isinstance(other, SeqFeature):
+            return False
+        if (self.id != "<uknown id>"
+              and other.id != "<uknown id>" and
+              self.id == other.id):
+            return True         # Can we trust this?
+        for x in ('location', 'type', 'strand', 'location_operator'):
+            if (getattr(self, x) and getattr(other, x) and \
+                    getattr(self, x) != getattr(other, x)):
+                return False
+        for f in self.sub_features:
+            if f not in other.sub_features:
+                return False
+        else:
+            return True
+
     def extract(self, parent_sequence):
         """Extract feature sequence from the supplied parent sequence.

Note the patch will not apply to the trunk, perhaps it is against
the current release?

First (logically), is defining __eq__ for the FeatureLocation,
and second is defining __eq__ for the SeqFeature. This
hides the fact that we need to compare position objects,
e.g. is BeforePosition(5) == ExactPosition(5)?, the answer
is yes, which I have now clarified in the docstrings:

https://github.com/biopython/biopython/commit/55feea75f7ab55eac4ef4e320567d746ce41120a

Other than the fact that I think the ref and ref_db should be
checked when comparing locations, adding location comparison
seems like a good idea. Note that with the recent changes on
the trunk, the strand, ref and ref_db now belong to the
FeatureLocation not the SeqFeature.

Extending this to cover the SeqFeature leaves the ID, type,
etc and is fiddly: Particularly the question of annotation.
These are essentially the same reasons why we don't support
SeqRecord equality.

Joshua - would you like to update your patch against the
code in github, just for the FeatureLocation __eq__ method,
to include the strand, ref and red_db properties?

Thanks,

Peter


---------- Forwarded message ----------
From: "Joshua Ismael Haase Hernández" <hahj87 at gmail.com>
To: biopython-dev at biopython.org
Date: Mon, 17 Oct 2011 01:06:17 -0500
Subject: [patch] SeqFeature comparison for equality
Hi there.

I was working on a testcase for a custom program
which should extract the same features I had planned.

Since SeqFeature lacs comparison method, there is no
easy way to test

for feature in test_gene.features:
   self.asserIn(feature, myparser(file).features)

So I added comparison methods and they work fine.

Patch attached. My changes are under Biopython license.




More information about the Biopython-dev mailing list