[BioPython] How to test Sequence objects for equality?

Tal Einat taleinat at gmail.com
Sat Mar 29 10:06:59 EDT 2008


Iddo Friedberg wrote:
>
> On Sat, Mar 29, 2008 at 5:38 AM, Tal Einat <taleinat at gmail.com> wrote:
> >
> > Hello,
> >
> > I'm new to BioPython, but I've managed to stumble in my very first
> > steps. Could someone help explain this behavior?
> >
> > >>> from Bio.Seq import Seq
> > >>> from Bio.Alphabet import IUPAC
> > >>> Seq('A', IUPAC.unambiguous_dna) == Seq('A', IUPAC.unambiguous_dna)
> > False
> > >>>
> >
> > My current goal is to search for (possibly ambiguous) matching
> > sequences in an efficient manner, but I haven't found docs or a
> > tutorial which cover this.
>
> Seq types do not support a comparison function. The reason is that it is not
> very common  to perform a 100% identity on two sequences.  You can just
> extract the strings and compare.
>
>
> The more common case is seqeunce alignment, and Biopython does support that.
> You can use Bio.pairwise2 (documentation in the module source code, not in
> the cookbook). Or for multiple alignments you can call ClustalX externally
> (the tutorial / cookbook explains that).

Hello Iddo, thank you for the quick response!

Extracting the strings and comparing is good for exact matches, but I
also need to match sequences with ambiguities. Is there no such
function in BioPython?

Unfortunately sequence alignment is not what I'm trying to do, so much
so that I can't think of a way to transform my problem into a sequence
alignment problem. I really do need to compare pairs of sequences one
by one, as efficiently as possible.

On a side note, I was surprised by having == return False for
identical sequences. To make BioPython less confusing, may I suggest
either disabling comparison of sequences or making such comparison do
the Right Thing?

- Tal


More information about the BioPython mailing list