[Biopython-dev] Changing Seq equality

Eric Talevich eric.talevich at gmail.com
Thu Nov 26 20:13:37 UTC 2009

On Thu, Nov 26, 2009 at 5:41 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Nov 26, 2009 at 7:14 AM, Eric Talevich <eric.talevich at gmail.com> wrote:
>> On Wed, Nov 25, 2009 at 6:48 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> Doing anything complex with alphabets may fall into the "hard
>>> to explain" category. Using object identity or string identity is
>>> at least simple to explain.
>>> Thus far we have just two options, and neither is ideal:
>>> (a) Object identity, following id(seq1)==id(seq2) as now
>>> (b) String identity, following str(seq1)==str(seq2)
>> How about (c), string and generic alphabet identity, where
>> Seq.__hash__ uses the sequence string and some simplification of the
>> alphabets types like Jose described.
>> [...]
>> def __hash__(self):
>>    """Same string, same alphabet --> same hash."""
>>    [...]
> [...]
> This idea (c) has a major drawback for me, in that it appears you
> wouldn't support comparing Seq objects to strings. However,
> perhaps that is actually a good thing - that could raise a TypeError,
> to force the user to do str(my_seq) == "ACG" which is explicit.

I guess this is the basic question: is a Seq a string-type, or complex
class that contains a string (is-a vs. has-a)? Python will let us be
inconsistent with the type system if want, but for a class as
fundamental as Seq, I think it should be consistent.

Biopython-dev discussed making Seq inherit from str or basestring
earlier [1], and I think it was decided that while actual inheritance
would be tricky, Seq should mimic that interface as much as possible
(using the alphabet attribute for validation and extra features,
mainly). So we'd treat Seq as a string-like type -- option (b) -- and
let SeqRecord be the complex type that has a sequence, accession
number, location, etc., where object identity is the only valid case
for equality.

In short: +1 for your patch on GitHub; I think the rationale is solid.


[1] http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c6

More information about the Biopython-dev mailing list