[Biopython-dev] sequence class proposal

Mon Jun 2 14:14:40 UTC 2008

In reply to Jose, I (Peter) wrote:
>> One of your points seemed to be that the SeqRecord couldn't have a
>> __getitem__ and methods like reverse, complement, etc.  I don't see
>> why it couldn't have these.  Perhaps rather than introducing a whole
>> new class, enhancing the SeqRecord would be a better avenue.

I've filed Bug 2507 to try and show what I had in mind for the
__getitem__ method.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507

Adding further methods for (reverse) complement etc could be done in
much the same way.

Returning to extending Biopython to support per-letter-annotation, I
can see two options:

Right now, the SeqRecord object HAS a Seq object.  If we create a new
RichSeq which subclasses the Seq object to provide
per-letter-annotation, then you could use a SeqRecord where the .seq
property is in fact a RichSeq object.  The SeqRecord class doesn't
need to have any changes made for this to work (assuming the RichSeq
provides the same API as the Seq object).

If we make the SeqRecord a subclass of the Seq object, then I would
suggest either RichSeq subclassing SeqRecord subclassing Seq, or
perhaps SeqRecord subclassing RichSeq subclassing Seq.  It depends on
if you think the id/name/description/dbxrefs/etc properties would be
useful in common use cases of the RichSeq object.

Its not going to be possible for all three classes to have the same
__init__ parameters without breaking existing scripts (and only
supporting the lowest common denominator).

Peter