[Biopython-dev] SeqFeature start/end and making positions act like ints

Wed Sep 28 14:29:06 UTC 2011

On Mon, Sep 19, 2011 at 10:03 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Sat, Sep 17, 2011 at 8:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Sat, Sep 17, 2011 at 2:44 PM, Eric Talevich wrote:
>>> The new start/end properties you implemented
>>> look good to me, and I doubt there would be a serious hit
>>> to performance -- plus, code that didn't need these shortcuts
>>> don't have to use them.
>>
>> Good. I've realised I need to double check the integer
>> methods (equals, sorting, hashes etc), but they should
>> be fine.
>
> Thinking about this more, the current _shift method of
> the position objects (used in SeqRecord slicing) would
> make sense as the __add__ method, thus:
>
> BeforePosition(5) + 10 --> BeforePosition(15)
>
> rather than currently:
>
> BeforePosition(5)._shift(10) --> BeforePosition(15)
>
> However, perhaps that is just making work for ourselves,
> we'd have to implement code for all the mixture cases, e.g.
>
> BeforePosition(5) + AfterPosition(10) --> UncertainPosition(15)

I went with the practical option - for all the maths operations
etc you just get the basic int behaviour. Much simpler!

Having done a bit of testing to reassure myself there was
no unexpected performance regression, I have committed this
to the trunk (as a single commit - it seemed cleaner to me):

https://github.com/biopython/biopython/commit/c52e986a3da571a5793b00958c5bbcde1d581526

Note I have not included the SeqFeature start/end proxy
methods. There is a reason for this related to the other
location changes I've been playing with. I've been thinking
it makes more sense for the start/end of a join etc to give
the lowest start and the highest end of the sub-locations.
In general that means no change to the current situation,
but it does matter for origin spanning & out-of-order splicing.
The min/max like behaviour seems more useful (for both
visualisation, but also bounds checking).

This branch is now defunct, and I may delete it at some point:
https://github.com/peterjc/biopython/tree/int_pos

>>> These will be handy for writing code that visualizes
>>> SeqFeatures, too.
>>
>> Well, slightly easier - I have some more dramatic changes to
>> the SeqFeature and FeatureLocation objects planned, but I'm
>> still playing with this.
>
> One of the key changes (which can be done without
> really changing the API) is to move the database &
> accession and the strand from the SeqFeature to the
> FeatureLocation. These are intimately connected with
> the location, as much as the start/end.

I think these changes can be applied to the trunk for
the next release.

> This is one of the things I've been working on here:
> https://github.com/peterjc/biopython/commits/f_loc
>
> The other key change on that experimental branch
> is moving away from sub_features for join locations
> (etc). Here I was trying a new CoupoundLocation
> object, but am still wondering if this should be done
> in the SeqFeature or FeatureLocation object instead
> (or if SeqFeature should subclass FeatureLocation).

I'm still thinking about this - but haven't done any more
code on it just recently. I'll return to this issue later.

Peter