[BioPython] How to test Sequence objects for equality?

Peter biopython at maubp.freeserve.co.uk
Sat Mar 29 16:05:31 UTC 2008


On Sat, Mar 29, 2008 at 12:38 PM, Tal Einat <taleinat at gmail.com> wrote:
> Hello,
>
>  I'm new to BioPython, but I've managed to stumble in my very first
>  steps. Could someone help explain this behavior?
>
>  >>> from Bio.Seq import Seq
>  >>> from Bio.Alphabet import IUPAC
>  >>> Seq('A', IUPAC.unambiguous_dna) == Seq('A', IUPAC.unambiguous_dna)
>  False

This is a little tricky because Biopython would have to be able to
decide sequence equality based on a combination of the sequence and
the alphabets.  For example, which of the following would you say are
equal:

Seq('A', IUPAC.unambiguous_dna)
Seq('A', IUPAC.ambiguous_dna)
Seq('A', IUPAC.unambiguous_rna)
Seq('A', IUPAC.ambiguous_dna)
Seq('A', IUPAC.protein)
etc

In this sort of work, you probably won't be trying to compare DNA to
RNA, or to proteins - all you care about is the sequence string
itself.  So compare that:

from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
alpha = Seq('ACG', IUPAC.unambiguous_dna)
beta = Seq('ACG', IUPAC.ambiguous_dna)
gamma = Seq('ACN', IUPAC.ambiguous_dna)

print str(alpha) == str(beta)
print str(beta) == str(gamma)

NOTE - If you are using an older version of Biopython, do this instead:

print alpha.tostring() == beta.tostring()
print beta.tostring() == gamma.tostring()

Peter



More information about the Biopython mailing list