[Biopython-dev] Alphabet bug in Bio.Motif and Bio.motifs

Bartek Wilczynski barwil at gmail.com
Wed Jun 5 08:13:00 UTC 2013


I'm a bit out of the loop here, but to me it seems like a simple issue:

Why not change the problematic code:

 if self.alphabet!=IUPAC.unambiguous_dna:
        raise ValueError("Wrong alphabet! Use only with DNA motifs")

into:

 if type(self.alphabet)!=type(IUPAC.unambiguous_dna):
        raise ValueError("Wrong alphabet! Use only with DNA motifs")

and worry about fixing the Bio.Alphabet issues later (it does sound
reasonable to make sure that any alphabet instance is a singleton).

best
Bartek
On Wed, Jun 5, 2013 at 4:28 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
> I have never quite understood why we need a separate class for each
> alphabet.
> I would think that a single alphabet class (or maybe a DNA, an RNA, and a
> protein alphabet class) is sufficient, and that the specific alphabets are
> instances of this class.
> Also, alphabets are essentially sets of letters, so an Alphabet class should
> inherit from set, allowing us to use its associated methods to compare
> alphabets to each other.
>
> Best,
> -Michiel.
>
>
> ________________________________
> From: Peter Cock <p.j.a.cock at googlemail.com>
> To: Bartek Wilczynski <barwil at gmail.com>; Michiel de Hoon
> <mjldehoon at yahoo.com>
> Cc: Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Sent: Wednesday, June 5, 2013 2:29 AM
> Subject: Alphabet bug in Bio.Motif and Bio.motifs
>
> Hi Bartek,
>
> I'm hoping you or Michiel can investigate this issue,
> http://www.biostars.org/p/73500/
>
> I believe Ivan has correctly diagnosed a Biopython issue in the alphabet
> handling of the motif class on this BioStars question, and he's given a
> workaround. The problem code looks like this:
>
>         if self.alphabet!=IUPAC.unambiguous_dna:
>             raise ValueError("Wrong alphabet! Use only with DNA motifs")
>
> First, assuming the test is really for just IUPAC unambiguous DNA,
> the error message is misleading - it sounds like using generic_dna
> or IUPAC ambiguous DNA would be acceptable but it isn't.
>
> The core problem here is that IUPAC.unambiguous_dna is just
> one instance of the IUPACUnambiguousDNA() class, and other
> instances should be equally acceptable but will fail the equality.
>
> I have sometimes wondered if we could and should make some of
> the Alphabet objects into singletons (only one instance allowed),
> which might be one way to solve this issue.
>
> Alternatively, perhaps all we need is to here is see if the alphabet
> is DNA and which letter set it uses? Is that the key point for the matrix
> calculations etc? e.g.
>
> from Bio.Alphabet import _get_base_alphabet, DNAAlphabet
>
>     if not isinstance(_get_base_alphabet(self.alphabet), DNAAlphabet):
>         raise ValueError("This only works for DNA motifs")
>     if not self.alphabet.letters == unambiguous_dna.letters:
>         raise ValueError("Expected IUPAC.unambiguous_dna or similar")
>
> (Untested, and these suggested error messages need some work)
>
> Regards,
>
> Peter
>
>



-- 
Bartek Wilczynski
==================
Institute of Informatics
University of Warsaw
http://www.mimuw.edu.pl/~bartek



More information about the Biopython-dev mailing list