[Biopython] Biopython Enhancement Proposal (BEP): Alphabets

Michiel de Hoon mjldehoon at yahoo.com
Sat Aug 4 02:04:28 UTC 2018


Dear all,
While sequence objects in Biopython have an associated alphabet, the purpose of alphabets in Biopython is currently not well-defined.I can imagine these three interpretations of their purpose:   
   - To define how the sequence data is stored internally in a Seq object (i.e. what kind of objects are in seq.data);
   - To define conceptually what the Seq object contains (e.g. this is a protein, or this is DNA, or this is DNA with or without methylation);
   - To define how a Seq object should be presented to the user (e.g. as a single-letter string, a three-letter string, or something else).
(and there may be others that I have overlooked).

To justify having alphabets as a part of Biopython, their purpose should be clearly defined.
Because of the complexity of alphabets and their use in Biopython, we felt that it may be a good idea to have a PEP (Python Enhancement Proposal)-like discussion to define the purpose of alphabets and their technical implementation in Biopython. This would mean that somebody who is in favor of having alphabets in Biopython would work out a proposal with all the details to allow developers and users to think through the implications.

Here you can find a description of PEPs and what should go in them: 
https://www.python.org/dev/peps/pep-0001/
Not all of it is applicable to Biopython, but it may serve as a general guideline.
The Alphabet BEP (Biopython Enhancement Proposal) could be hosted on the Biopython website so that everybody can follow the discussion.

Since alphabets have been under discussion for more than 10 years, we are thinking to put a time limit to the proposal (e.g., until January 1st, 2020), meaning that if no agreement on the proposal is reached by then, alphabets would be removed from Biopython. This would give people who are in favor of alphabets to make their case, while guaranteeing that a conclusion will be reached (either a well-defined and usable alphabet, or no alphabet) within the next ~1.5 years. 


Any volunteers? Seq objects and therefore their alphabets are a key feature of Biopython, and working through a BEP can give you the opportunity to help design a major part of Biopython.


Best,-Michiel













-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20180804/5eb5bac6/attachment.html>


More information about the Biopython mailing list