[Biopython-dev] Alphabet bug in Bio.Motif and Bio.motifs

Michiel de Hoon mjldehoon at yahoo.com
Wed Jun 5 02:28:28 UTC 2013


Hi Peter,

I have never quite understood why we need a separate class for each alphabet.
I would think that a single alphabet class (or maybe a DNA, an RNA, and a protein alphabet class) is sufficient, and that the specific alphabets are instances of this class.
Also, alphabets are essentially sets of letters, so an Alphabet class should inherit from set, allowing us to use its associated methods to compare alphabets to each other.

Best,
-Michiel.




________________________________
 From: Peter Cock <p.j.a.cock at googlemail.com>
To: Bartek Wilczynski <barwil at gmail.com>; Michiel de Hoon <mjldehoon at yahoo.com> 
Cc: Biopython-Dev Mailing List <biopython-dev at biopython.org> 
Sent: Wednesday, June 5, 2013 2:29 AM
Subject: Alphabet bug in Bio.Motif and Bio.motifs
 

Hi Bartek,

I'm hoping you or Michiel can investigate this issue,
http://www.biostars.org/p/73500/

I believe Ivan has correctly diagnosed a Biopython issue in the alphabet
handling of the motif class on this BioStars question, and he's given a
workaround. The problem code looks like this:

        if self.alphabet!=IUPAC.unambiguous_dna:
            raise ValueError("Wrong alphabet! Use only with DNA motifs")

First, assuming the test is really for just IUPAC unambiguous DNA,
the error message is misleading - it sounds like using generic_dna
or IUPAC ambiguous DNA would be acceptable but it isn't.

The core problem here is that IUPAC.unambiguous_dna is just
one instance of the IUPACUnambiguousDNA() class, and other
instances should be equally acceptable but will fail the equality.

I have sometimes wondered if we could and should make some of
the Alphabet objects into singletons (only one instance allowed),
which might be one way to solve this issue.

Alternatively, perhaps all we need is to here is see if the alphabet
is DNA and which letter set it uses? Is that the key point for the matrix
calculations etc? e.g.

from Bio.Alphabet import _get_base_alphabet, DNAAlphabet

    if not isinstance(_get_base_alphabet(self.alphabet), DNAAlphabet):
        raise ValueError("This only works for DNA motifs")
    if not self.alphabet.letters == unambiguous_dna.letters:
        raise ValueError("Expected IUPAC.unambiguous_dna or similar")

(Untested, and these suggested error messages need some work)

Regards,

Peter



More information about the Biopython-dev mailing list