[Biopython-dev] [Bug 2550] New: Alphabet problems when adding sequences

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Sun Jul 27 11:30:37 EDT 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2550

           Summary: Alphabet problems when adding sequences
           Product: Biopython
           Version: 1.47
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


#Create three sequences as Seq objects,
>>> from Bio import Alphabet
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> a = Seq("ACTG", Alphabet.generic_dna)
>>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-"))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> a
Seq('ACTG', DNAAlphabet())
>>> b
Seq('AC-TG', Gapped(DNAAlphabet(), '-'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))

#Now try adding them together...
>>> b+c
Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+b
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
77, in __add__
    elif other.alphabet.contains(self.alphabet):
  File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line
95, in contains
    return other.gap_char == self.gap_char and \
AttributeError: DNAAlphabet instance has no attribute 'gap_char'

I would expect to get:
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))

Similar example, but using proteins
>>> p = Seq("ACDEFG", Alphabet.generic_protein)
>>> q = Seq("ACDEFG", IUPAC.protein)
>>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*"))
>>> p
Seq('ACDEFG', ProteinAlphabet())
>>> q
Seq('ACDEFG', IUPACProtein())
>>> r
Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*'))

#Now try adding these together...
>>> p+q
Seq('ACDEFGACDEFG', ProteinAlphabet())
>>> p+r
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
77, in __add__
    elif other.alphabet.contains(self.alphabet):
  File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line
110, in contains
    return other.stop_symbol == self.stop_symbol and \
AttributeError: ProteinAlphabet instance has no attribute 'stop_symbol'


Here is an example of a more reasonable failure,
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> d
Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.'))
>>> c+d
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
80, in __add__
    raise TypeError, ("incompatable alphabets", str(self.alphabet),
TypeError: ('incompatable alphabets', "Gapped(IUPACUnambiguousDNA(), '-')",
"Gapped(IUPACUnambiguousDNA(), '.')")

I am OK with this failing with a TypeError.  However, one might argue that
reverting to a generic DNA alphabet with no declared alphabet was desirable:
Seq("AC-TGAC.TG", DNAAlphabet()))


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list