[Biopython] Fw: Hiding alphabets

Peter Cock p.j.a.cock at googlemail.com
Tue Jun 5 11:19:10 UTC 2018


Dear Biopythoneers,

I know that Michiel has long expressed a wish to get rid of
the current alphabet system in Biopython, and I agree that
the historic design is overly complicated and often gets in
the way - but we don't have any concrete proposals to replace
it. Part of the problem here is coming up with a replacement
with the least painful transition - and being practical the less
people use the alphabets, the less trouble any changes would
cause.

The proposal here would de-emphasis the use of alphabets,
reflecting the fact that for the vast majority of scripts and
code you can just ignore them.

There are still corner cases - for example, for some of the
SeqIO output filetypes we currently need to use the Seq's
alphabet to label the sequence type (RNA, DNA, protein).

Still, overall I can see it being quite practical to downplay
the alphabet objects in our user facing documentation,
and hiding them in the Seq objects' __repr__ helps there.

Is this a case where in the Zen of Python where practicality
wins out over being explicit about what a sequence object
contains?

"Explicit is better than implicit.
...
Although practicality beats purity."

https://www.python.org/dev/peps/pep-0020/

Thoughts and comments welcome here on on the issue,
https://github.com/biopython/biopython/issues/1674

Peter

On Sun, Jun 3, 2018 at 1:02 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Dear all,
>
> I have opened an issue here:
> https://github.com/biopython/biopython/issues/1674
> in case anybody has any comments or suggestions.
>
> Best,
> -Michiel
>
>
>
> On Saturday, May 26, 2018 11:50 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
>
>
> Dear all,
>
>
>
> In Biopython, Seq objects show both their sequence content and the alphabet
> associated with them.
> For example, the first example in our Biopython Tutorial & Cookbook starts
> as follows:
>
>>>> from Bio.Seq import Seq
>>>> my_seq = Seq("AGTACACTGGT")
>>>> my_seq
> Seq('AGTACACTGGT', Alphabet())
>
> I don't think we need to show the alphabet here. It takes up screen space,
> and oftentimes it's uninformative (as in the example above); the other
> examples in the same section of the tutorial show SingleLetterAlphabet and
> IUPACAmbiguousDNA. Even in the latter case, I don't think users need to be
> reminded every time that they are dealing with DNA.
>
> Perhaps more importantly, this is very confusing for new users. I would say
> that alphabets are of minor importance in Biopython overall. Some might say
> that they should be abolished altogether. But if we start off our tutorial
> by showing Alphabet, IUPAC.unambiguous_dna, SingleLetterAlphabet etc., then
> a reasonable question from students would be what they are and why we use
> them. I don't have a good answer to that question.
> In addition, the design of the Alphabet class is problematic.
>
> Shall we change the __repr__ function of Seq objects to show the sequence
> only? I.e. the example above would show
>
>>>> from Bio.Seq import Seq
>>>> my_seq = Seq("AGTACACTGGT")
>>>> my_seq
> Seq('AGTACACTGGT')
>
> Then the section on alphabets in the Tutorial can move to the end of the
> chapter, for people who actually want to use Alphabets.
>
> For each sequence object, the alphabet would still be accessible as the
> attribute to the Seq object:
>
>>>> my_seq.alphabet
> Alphabet()
>
>
> Best,
> -Michiel
>
>
>
>
>
>
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list