[BioPython] Rethinking Seq objects

Michiel Jan Laurens de Hoon mdehoon at ims.u-tokyo.ac.jp
Tue May 3 02:45:00 EDT 2005

Hi everybody,

Recently, there was a discussion on biopython-dev about changes to the Seq and 
MutableSeq classses. I'd like to ask you if any of the proposed changes would 
cause you any problems.

The current proposal is:

1) Make Seq objects mutable, and get rid of MutableSeq. The Seq class and the 
MutableSeq class basically describe the same thing, except that one is read-only 
and the other one is not. If desired, we can add a readonly flag to the class to 
describe if it is mutable or not. (Given that e.g. Numerical Python arrays don't 
have such a flag, my feeling is that it is not really needed for Seq objects 
either). For performance reasons, the new Seq class will be implemented in C.

2) By default, a Seq class doesn't assume a particular alphabet. Same as current 
 >>>  from Bio.Seq import *
 >>>  Seq('ATCG')
Seq('ATCG', Alphabet())
However, if the user decides to specify the alphabet explicitly, input to the 
sequence will be checked for consistency with the alphabet. So
 >>>  from Bio.Seq import *
 >>>  from Bio.Alphabet import IUPAC
 >>>  my_alpha = IUPAC.unambiguous_dna
 >>>  s[:3] = "XYZ"
will raise an error.

3) Make Seq objects understand circular genomes. Many bacterial genomes are 
circular. It would be nice if we could take the indices [-1000:1000] from a Seq 
object, if it is circular, or [3999000:40001000] if the sequence is circular 
with length 4000000.
Circular genomes will likely be implemented as an optional keyword (perhaps 
"topology") when creating the Seq object, with corresponding set_topology, 
get_topology methods.

4) Perhaps it would be a good idea to add transcribe and translate methods to 
the Seq class. Currently, to translate a DNA sequence, we have to do
 >>> from Bio.Seq import Seq
 >>> from Bio import Translate
 >>> from Bio.Alphabet import IUPAC
 >>> my_alpha = IUPAC.unambiguous_dna
 >>> standard_translator = Translate.unambiguous_dna_by_id[1]
 >>> standard_translator.translate(my_seq)
Seq('AIVMGR*KGAR', IUPACProtein())
which is too much typing for my taste.

Questions/comments/suggestions are welcome. None of this has actually been coded 
yet, so it's all still open to discussion.


Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639

More information about the BioPython mailing list