[Biopython-dev] SeqFeature start/end and making positions act like ints

Eric Talevich eric.talevich at gmail.com
Fri Sep 16 20:33:19 UTC 2011


On Fri, Sep 16, 2011 at 12:31 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi all,
>
> We've previously discussed adding start/end properties
> to the SeqFeature returning integers - which would be
> useful but inconsistent with the FeatureLocation which
> returns Position objects:
>
> https://redmine.open-bio.org/issues/2818
>
> After an interesting discussion with Leighton, I spent
> the afternoon making (most of the) Position objects
> subclass int - so that they can be used like integers
> (with the fuzzy information retained but generally
> ignored except for writing the features out again).
>
> This means we can have SeqFeature start/end
> properties which like those of the FeatureLocation
> return position objects - and they are actually easy
> to use (except for some very extreme cases).
> e.g. You can use them to slice a sequence.
>
> The code is on a branch here:
> https://github.com/peterjc/biopython/tree/int_pos
>
> It is almost 100% backwards compatible. Some
> of the arguments for creating a fuzzy position
> (and their __repr__) have changed, and some
> of their attributes, but we feel this is unlikely to
> actually affect anyone. We rather suspect only
> the SeqIO parsers actually create or use the
> fuzzy objects in the first place!
>
> In terms of usability I think this is a worthwhile
> improvement. The new class heirachy is a bit
> more complex though - and I have not looked
> at the performance implications at all.
>
> Would anyone like to review this please?
>
>

Here's another way to do it, maybe -- modify Seq.Seq.__getitem__ to also
check if it's been given a SeqFeature, and if so, handle the joins there.
The handling of fuzziness could happen in here or use the new .start and
.end properties.

Outline:

    def __getitem__(self, index):
        """Returns a subsequence of single letter, use my_seq[index]."""
        if isinstance(index, int):
            #Return a single letter as a string
            return self._data[index]
        elif isinstance(index, SeqFeature):
            # NEW -- handle start/end/join voodoo safely
            # if there's a join, extract the subsequences and then
concatenate them
            return the_result
        else:
            #Return the (sub)sequence as another Seq object
            return Seq(self._data[index], self.alphabet)


Think that would work?

-Eric



More information about the Biopython-dev mailing list