[Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Sun Jul 27 19:06:22 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 15:06 EST -------
With the patch, repeating the example in my comment 0,
>>> from Bio import Alphabet
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> a = Seq("ACTG", Alphabet.generic_dna)
>>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-"))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> a
Seq('ACTG', DNAAlphabet())
>>> b
Seq('AC-TG', Gapped(DNAAlphabet(), '-'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> b+c
Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+b
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+c
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
i.e. All the above additions work now.
>>> p = Seq("ACDEFG", Alphabet.generic_protein)
>>> q = Seq("ACDEFG", IUPAC.protein)
>>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*"))
>>> p
Seq('ACDEFG', ProteinAlphabet())
>>> q
Seq('ACDEFG', IUPACProtein())
>>> r
Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*'))
>>> p+q
Seq('ACDEFGACDEFG', ProteinAlphabet())
>>> p+r
Seq('ACDEFGACDEFG*', HasStopCodon(ProteinAlphabet(), '*'))
These work too.
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> d
Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.'))
>>> c+d
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "Bio/Seq.py", line 78, in __add__
a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 199,
in _consensus_alphabet
raise ValueError("More than one gap character present")
ValueError: More than one gap character present
The error message has changed (and is more explicit), but I think this is a
real failure case.
Then based on the example in my comment 1,
>>> p = Seq("PKL-PAK", Alphabet.Gapped(Alphabet.generic_protein,"-"))
>>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*"))
>>> p+q
Seq('PKL-PAKADKS*', HasStopCodon(Gapped(ProteinAlphabet(), '-'), '*'))
This works now too.
One final example of a valid failure:
>>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*"))
>>> r = Seq("SRFG@", Alphabet.HasStopCodon(Alphabet.generic_protein,"@"))
>>> q+r
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "Bio/Seq.py", line 78, in __add__
a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 208,
in _consensus_alphabet
raise ValueError("More than one stop symbol present")
ValueError: More than one stop symbol present
I'd be grateful if anyone could test this, or comment on the code. While
adding private functions to Bio.Alphabet is a reasonable short term solution
(and means we can change arguments and names without breaking people's
scripts!), some of this functionality might be best exposed publically.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list