[Biopython-dev] Alphabet bug in Bio.Motif and Bio.motifs

Peter Cock p.j.a.cock at googlemail.com
Tue Jun 4 17:29:55 UTC 2013


Hi Bartek,

I'm hoping you or Michiel can investigate this issue,
http://www.biostars.org/p/73500/

I believe Ivan has correctly diagnosed a Biopython issue in the alphabet
handling of the motif class on this BioStars question, and he's given a
workaround. The problem code looks like this:

        if self.alphabet!=IUPAC.unambiguous_dna:
            raise ValueError("Wrong alphabet! Use only with DNA motifs")

First, assuming the test is really for just IUPAC unambiguous DNA,
the error message is misleading - it sounds like using generic_dna
or IUPAC ambiguous DNA would be acceptable but it isn't.

The core problem here is that IUPAC.unambiguous_dna is just
one instance of the IUPACUnambiguousDNA() class, and other
instances should be equally acceptable but will fail the equality.

I have sometimes wondered if we could and should make some of
the Alphabet objects into singletons (only one instance allowed),
which might be one way to solve this issue.

Alternatively, perhaps all we need is to here is see if the alphabet
is DNA and which letter set it uses? Is that the key point for the matrix
calculations etc? e.g.

from Bio.Alphabet import _get_base_alphabet, DNAAlphabet

    if not isinstance(_get_base_alphabet(self.alphabet), DNAAlphabet):
        raise ValueError("This only works for DNA motifs")
    if not self.alphabet.letters == unambiguous_dna.letters:
        raise ValueError("Expected IUPAC.unambiguous_dna or similar")

(Untested, and these suggested error messages need some work)

Regards,

Peter



More information about the Biopython-dev mailing list