[Biopython-dev] sequence class proposal

Peter biopython at maubp.freeserve.co.uk
Thu May 22 15:47:58 UTC 2008


On Thu, May 22, 2008 at 8:30 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Dear Biopython developers,
> I've been using python and Biopython for some time now and I would like to
> talk with you about the sequence classes in Biopython. I have had some issues
> using the SeqRecord and Alignment classes and I have being discussing and
> implementing with two students (Victor Sanchez y Pablo Martinez) a proposal
> of a new sequence class. We would like to present this implementation as a
> tip in the discussion about the design of the sequence classes in Biopython
> and we're eager to receive your comments.

If I understood your terminology correctly, "qualities" is a list of
scores, one for each letter in the sequence.  I see this is a special
case of a more general situation where you have per-letter-annotation
information.  Examples include secondardy structure or residue
coordinates of a protein sequence.  Very often for example, secondary
structures are stored in files as a simple string whise length matches
the length of the sequence.  Also related are sub-features like
domains or promotor sites which span a range of residues.

So I would agree with you that an enhanced class would be useful,
where the per letter annotations were respected in splicing, reversing
etc.  Handling sub-features when slicing is less straight forward.

The current SeqRecord and Seq classes separate the sequence annotation
from the sequence letters themselves, making this sort of integration
difficult.  Making the SeqRecord a direct subclass of the Seq object
has previously been suggested and would pave the way for this sort of
operation.

See Bug 2351 where some of these ideas have been floated...
http://bugzilla.open-bio.org/show_bug.cgi?id=2351

There are a lot of things that would need to be discussed - for
example how would you handle the pre-sequence annotation (e.g. record
identifiers) when adding two "rich" seqeunces?  I've been content with
making small steps for now, with backwards compatibility always in
mind.

On another note, I'm also thinking about the need for an annotated
sequence alignment object, where there are similar concerns.

Also, have you discussed the alphabet objects?

> Do you think that is a good idea to attach this classes to bugzilla? Do we
> open a new bug or there's one for this sequence class debate already open?

Your proposals do seem very broad, so have a look at Bug 2351 first,
but perhaps start a new enhancement bug, and then attach the code.

Peter



More information about the Biopython-dev mailing list