[Biopython-dev] [Bug 2550] New: Alphabet problems when adding sequences
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Sun Jul 27 11:30:37 EDT 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
Summary: Alphabet problems when adding sequences
Product: Biopython
Version: 1.47
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
#Create three sequences as Seq objects,
>>> from Bio import Alphabet
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> a = Seq("ACTG", Alphabet.generic_dna)
>>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-"))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> a
Seq('ACTG', DNAAlphabet())
>>> b
Seq('AC-TG', Gapped(DNAAlphabet(), '-'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
#Now try adding them together...
>>> b+c
Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+b
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
77, in __add__
elif other.alphabet.contains(self.alphabet):
File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line
95, in contains
return other.gap_char == self.gap_char and \
AttributeError: DNAAlphabet instance has no attribute 'gap_char'
I would expect to get:
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
Similar example, but using proteins
>>> p = Seq("ACDEFG", Alphabet.generic_protein)
>>> q = Seq("ACDEFG", IUPAC.protein)
>>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*"))
>>> p
Seq('ACDEFG', ProteinAlphabet())
>>> q
Seq('ACDEFG', IUPACProtein())
>>> r
Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*'))
#Now try adding these together...
>>> p+q
Seq('ACDEFGACDEFG', ProteinAlphabet())
>>> p+r
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
77, in __add__
elif other.alphabet.contains(self.alphabet):
File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line
110, in contains
return other.stop_symbol == self.stop_symbol and \
AttributeError: ProteinAlphabet instance has no attribute 'stop_symbol'
Here is an example of a more reasonable failure,
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> d
Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.'))
>>> c+d
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
80, in __add__
raise TypeError, ("incompatable alphabets", str(self.alphabet),
TypeError: ('incompatable alphabets', "Gapped(IUPACUnambiguousDNA(), '-')",
"Gapped(IUPACUnambiguousDNA(), '.')")
I am OK with this failing with a TypeError. However, one might argue that
reverting to a generic DNA alphabet with no declared alphabet was desirable:
Seq("AC-TGAC.TG", DNAAlphabet()))
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list