[Biopython-dev] Changing Seq equality

Mon Feb 22 09:48:14 EST 2010

Hi all,

I've just got back from Japan - Brad and I were fortunate to be
able to attend the DBCLS BioHackathon 2010 held in Tokyo,
http://hackathon3.dbcls.jp/

As Brad already mentioned in passing, we also managed to have
dinner one evening with Michiel, and had an informal chat about
Biopython plans. Expect a few more emails on other topics to
follow.

One of the short term aims we agreed on was to press ahead
with the Seq equality changes outlined on this thread late last
year. Mailing list archive link:
http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007021.html

To recap, the agreed best behaviour was to make Seq equality
act like string equality, but to raise a Python warning when
incompatible alphabets are compared (e.g. DNA to Protein).
This also applies to all the other comparison operators:
not equal, less than, greater than, less than or equal, and
greater than or equal.

This is my outline plan for the change:

For Biopython up to 1.53, Seq class uses object equality,
seq1==seq2 acts as id(seq1)==id(seq2)

For Biopython 1.54 (and perhaps a few more releases),
the Seq classes will still use object equality but will trigger
a warning suggesting explicit use of  id(seq1)==id(seq2)
or str(seq1)==str(seq2) as appropriate.

For Biopython 1.xx (maybe 1.55 or 1.56?) the Seq classes
will switch to using string equality (with an alphabet aware
warning for comparing DNA to RNA etc), but will also trigger
a warning that this is a change from previous releases, and
suggest in the short term the continued explicit use of either
id(seq1)==id(seq2) for object identity or str(seq1)==str(seq2)
for string identity.

For Biopython 1.yy (maybe 1.57?) the Seq classes will
use string equality (with an alphabet aware warning for
comparing DNA to RNA etc), without any warning about
this being a change from historic behaviour.

These warning messages could also point at a wiki page,
and we'd need a FAQ entry in the tutorial as well. The
aim of this slightly drawn out switch is to try and make
sure all users are aware of the change, even if they
only update their copy of Biopython every few releases.

Does that all sound sensible? If so, we should probably
have an announcement on the main mailing list, in case
there are any other views.

Other more complex options include a flag for switching
between the modes - but that complexity doesn't seem
such a good idea to me. All my own code and most of
the unit tests use str(seq1)==str(seq2) explicitly anyway.
The only exception is some of the genetic algorithm unit
tests which do seem to want explicit object identity.

Regards,

Peter