[Biopython-dev] Changing Seq equality

Peter biopython at maubp.freeserve.co.uk
Wed Nov 25 05:26:34 EST 2009


On Wed, Nov 25, 2009 at 8:45 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
>> On 24/11/2009 11:30, "Peter" <biopython at maubp.freeserve.co.uk> wrote:
>> > The problem is if we'd like Seq("ACGT") to be equal to
>> > Seq("ACGT", generic_dna) then both must have the
>> > same hash. Then, if we also want Seq("ACGT") and
>> > Seq("ACGT", generic_protein) to be equal, they too must
>> > have the same hash. This means Seq("ACGT", generic_dna)
>> > and Seq("ACGT",generic_protein) would have the same
>> > hash, and therefore must evaluate as equal (!). The
>> > natural consequence of this chain of logic is we would
>> > then have Seq("ACGT") == Seq("ACGT", generic_dna)
>> > == Seq("ACGT",generic_protein) == Seq("ACGT",...).
>> > You reach the same point if we require the string
>> > "ACGT" equals Seq("ACGT", some_alphabet)
>
> Oh! I didn't know that! It's great to learn new python things!
> I'm being naive here because I just have a swallow understanding
> of the problem, but here are my two cents.

It took me a while to try and understand this stuff - its tricky
and I'm not 100% sure I have the details perfectly right.

> would it be possible to generate the hashes and the __eq__ taking into account
> the base alphabet. For instance DNAAlphabet=0, RNAAlphabet=1 and
> ProteinAlphabet=2. So to check if two sequences we would do something like:
> 'ACGT1' == 'ACGT2'

I'd wondered about that too - if we treated all DNA alphabets (generic,
IUPAC ambiguous etc) as one group, all RNA alphabets as another, and
all Protein as a third, then within those groups things are fine. But what
about all the other alphabets? In particular the generic (base) default
alphabet or the generic single letter alphabet? These are very very
commonly used (e.g. parsing a FASTA file without giving a specific
alphabet). i.e. It is only a partial solution that doesn't really work :(

Also, there is the issue of comparing a Seq object to a string. It would
be very nice to have string "ACGT" == Seq("ACGT", some_alphabet)
but that means we would also have to have hash("ACGT") ===
hash(Seq("ACGT", some_alphabet), which as noted above would
mean Seq comparisons would have to ignore the alphabet. Which
is bad :(

Peter


More information about the Biopython-dev mailing list