[Biopython-dev] Changing Seq equality

Jose Blanca jblanca at btc.upv.es
Wed Nov 25 11:20:53 UTC 2009


> > would it be possible to generate the hashes and the __eq__ taking into
> > account the base alphabet. For instance DNAAlphabet=0, RNAAlphabet=1 and
> > ProteinAlphabet=2. So to check if two sequences we would do something
> > like: 'ACGT1' == 'ACGT2'
>
> I'd wondered about that too - if we treated all DNA alphabets (generic,
> IUPAC ambiguous etc) as one group, all RNA alphabets as another, and
> all Protein as a third, then within those groups things are fine. But what
> about all the other alphabets? In particular the generic (base) default
> alphabet or the generic single letter alphabet? These are very very
> commonly used (e.g. parsing a FASTA file without giving a specific
> alphabet). i.e. It is only a partial solution that doesn't really work :(
>
> Also, there is the issue of comparing a Seq object to a string. It would
> be very nice to have string "ACGT" == Seq("ACGT", some_alphabet)
> but that means we would also have to have hash("ACGT") ===
> hash(Seq("ACGT", some_alphabet), which as noted above would
> mean Seq comparisons would have to ignore the alphabet. Which
> is bad :(

That's a tricky issue. I think that the desired behaviour should be defined 
and after that the implementation should go. One possible solution would be 
to consider the generic alphabet different than the more specific ones and 
consider the str as having a generic alphabet. It would be something like:

GenericAlphabet=0, DNAAlphabet=1, RNAAlphabet=2, ProteinAlphabet=3
if str:
    alphabet=generic
else:
    alphabet=seq.alphabet
return str(seq1) + str(alphabet) == str(seq2) + str(alphabet)

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)



More information about the Biopython-dev mailing list