[Biopython-dev] [Biopython] Subclassing Seq and SeqRecord
Peter
biopython at maubp.freeserve.co.uk
Tue Nov 24 15:52:25 UTC 2009
On Tue, Nov 24, 2009 at 2:58 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> What I mean is this:
> http://github.com/JoseBlanca/biopython/commit/d4c87365f614de2d69d800dc63d0cc25087d96dc
>
> I would like to change the Seq() and SeqRecord() for self.__class__
> There are places already in Seq in which self.__class__ is used.
> But this is not the case in all instances. I think that this would be a
> reasonable change.
I hadn't realised the add methods also used __class__, I thought it
was just __repr__ - good point.
Thinking about use-cases, sometimes a subclass will want the
methods to return Seq objects, sometimes the same class.
The UnknownSeq too sometimes can return another UnknownSeq,
but must often return a Seq object.
The BioSQL DBSeq on the other hand always returns a Seq
object for all its methods. The fact that the Seq __add__ and
__addr__ use __class__ was the cause of a bug in that adding
DBSeq objects didn't work.
A hypothetical CircularSeq could return CircularSeq objects for
some cases (e.g. upper, lower, and perhaps transcribe,
back_transcribe and reverse complement), Seq objects in other
cases (e.g. slicing) while in other cases it may depend on the data
(e.g. translation).
Essentially, for an Seq subclass you may need to look at each
method in turn and decide which is most appropriate. So which
is the most sensible default behaviour from the Seq object?
The cautious "return a Seq" approach will be robust (and makes
sense for the existing Biopython subclasses, DBSeq and the
UnknownSeq), but makes changing this in the subclass harder
(as Jose has found).
What does your Seq subclass aim to do? Add one or two general
methods to enhance the Seq object - or model something a little
different?
If anyone else has written Seq (or SeqRecord) subclasses, it
would be very helpful to hear about them. After all, the change
Jose is proposing may break your code ;)
>> > def __add__(self, seq2):
>> > '''It returns a new object with both seq and qual joined '''
>> > #per letter annotations
>> > new_seq = self.__class__(name = self.name + '+' + seq2.name,
>> > id = self.id + '+' + seq2.id,
>> > seq = self.seq + seq2.seq)
>> > #the letter annotations, including quality
>> > for name, annot in self.letter_annotations.items():
>> > if name in seq2.letter_annotations:
>> > new_seq.letter_annotations[name] = annot + \
>> > seq2.letter_annotations[name]
>> > return new_seq
>>
>> This bit is much less clear to me - you completely ignore any
>> features. Was it written before I added the __add__ method
>> to the original SeqRecord (expected to be in Biopython 1.53)?
>
> Yes, it was added much earliear. I'll remove it as soon as the
> Biopython SeqRecord has one.
OK - your code makes more sense now. The Biopython trunk
does now have an __add__ method (which I expect to be in
Biopython 1.53).
Peter
More information about the Biopython-dev
mailing list