[Biopython-dev] Sequence object allows non-alphabet characters

Peter Cock p.j.a.cock at googlemail.com
Tue Dec 20 15:27:00 UTC 2011


On Tue, Dec 20, 2011 at 2:48 PM, Markus Piotrowski
<Markus.Piotrowski at ruhr-uni-bochum.de> wrote:
> Eric Talevich <eric.talevich <at> gmail.com> writes:
>
>> As another alternative, you could add a method Seq.validate() which must be
>> called separately. Then you'd have a way to trigger validation even after
>> directly setting seq.data or .alphabet.
>>
>> -E
>>
>
> There is a function _verify_alphabet(sequence) in the package Alphabet,
> which does exactly this.

That starts with an underscore so it is a private API.

> However, the example given in the API documentation doesn't
> work for me:
>
>>>> from Bio.Seq import Seq
>>>> from Bio.Alphabet import IUPAC
>>>> my_seq = Seq ("MKQHK", IUPAC.protein)
>>>> _verify_alphabet(my_seq)
>
> Traceback (most recent call last):
>  File "<pyshell#6>", line 1, in <module>
>    _verify_alphabet(my_seq)
> NameError: name '_verify_alphabet' is not defined

You didn't import the function, thus a NameError

>>>> from Bio import Alphabet
>>>> Alphabet._verify_alphabet(my_seq)
> True
>
> Still, I would prefer to have checked the sequence against the
> choosen alphabet during initialization, maybe as option:
> Seq(sequence[, alphabet, verify])

Yes, there are certainly advantages to having the alphabet
validation happen during Seq object creation.

Peter




More information about the Biopython-dev mailing list