[Biopython] Hiding alphabets
Michiel de Hoon
mjldehoon at yahoo.com
Sat May 26 14:50:58 UTC 2018
Dear all,
In Biopython, Seq objects show both their sequence content and the alphabet associated with them.
For example, the first example in our Biopython Tutorial & Cookbook starts as follows:
>>> from Bio.Seq import Seq
>>> my_seq = Seq("AGTACACTGGT")
>>> my_seq
Seq('AGTACACTGGT', Alphabet())
I don't think we need to show the alphabet here. It takes up screen space, and oftentimes it's uninformative (as in the example above); the other examples in the same section of the tutorial show SingleLetterAlphabet and IUPACAmbiguousDNA. Even in the latter case, I don't think users need to be reminded every time that they are dealing with DNA.
Perhaps more importantly, this is very confusing for new users. I would say that alphabets are of minor importance in Biopython overall. Some might say that they should be abolished altogether. But if we start off our tutorial by showing Alphabet, IUPAC.unambiguous_dna, SingleLetterAlphabet etc., then a reasonable question from students would be what they are and why we use them. I don't have a good answer to that question.In addition, the design of the Alphabet class is problematic.
Shall we change the __repr__ function of Seq objects to show the sequence only? I.e. the example above would show
>>> from Bio.Seq import Seq
>>> my_seq = Seq("AGTACACTGGT")
>>> my_seq
Seq('AGTACACTGGT')
Then the section on alphabets in the Tutorial can move to the end of the chapter, for people who actually want to use Alphabets.
For each sequence object, the alphabet would still be accessible as the attribute to the Seq object:
>>> my_seq.alphabet
Alphabet()
Best,-Michiel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20180526/c8b112e8/attachment.html>
More information about the Biopython
mailing list