[Biopython-dev] Fwd: Re: sequence class proposal

Peter biopython at maubp.freeserve.co.uk
Thu Jun 5 09:17:00 UTC 2008


This is in reply to Jose's comment 3 on bug 2507, which was quite broad.
http://bugzilla.open-bio.org/show_bug.cgi?id=2507#c3

> I have coded a sequence class that fullfils the requirements that I
> would like to see. It's very similar to SeqRecord, but it is not compatible
> with it. It has no seq property, although that can be solved. The problem
> with SeqRecord is that it is not possible to create a class with an __init__
> compatible with Seq and SeqRecord at the same time.

Even if one day the SeqRecord is a subclass of the Seq object, there
is no requirement that it have the same __init__ arguments.  In fact,
have to be different because for a SeqRecord you should also supply an
identifier (and potentially a name, description and other annotation).

> This proposed class is just a draft, it needs more work but I would like to
> receive comments about it.  It inherits from MutableSeq so it should be
> named MutableRichSeq, but it seems that I'm too lazy to such a long name,
> I promise to change the name in a later version and to create a RichSeq
> with Seq as parent.

I agree with you here that when getting a single letter (amino acid or
nucleotide) from a sequence with per-letter-annotation, e.g.
my_sequence[5], it would be very nice to have the
per-letter-annotation like the quality included.  This does mean the
object returned can't just be a single one character string.  However,
because the current Seq and MutableSeq classes return a simple string,
unless we return a subclass of a string, this risks breaking other
peoples code.  So, I would conclude that Seq needs to subclass a
string BEFORE we start including support for per-letter-annotation.
Ideally we would have alphabet aware versions of all the string
functions before we made this change (see Bug 2351).

> Besides RichSeq there is in the attachment two other classes, RichFeature
> and BioRange, but I would comment on that in another post.

Your BioRange and BioFeature classes seem somewhat similar to the
current SeqFeature class with its locations (and sub features).

> I think that it is quite important to convert Seq and MutableSeq to newclasses,
> what do you think about that? With the new classes we can use properties.

I have been thinking about deprecating the Seq.data property (and also
the MutableSeq).  The data string (or array) should really be a
private implementation detail, perhaps Seq._data following the
underscore for private convention.  We can then add property methods
to make the Seq.data available (perhaps with a deprecation warning).

Peter



More information about the Biopython-dev mailing list