[Biopython-dev] SeqFeature start/end and making positions act like ints
Eric Talevich
eric.talevich at gmail.com
Fri Sep 16 20:33:19 UTC 2011
On Fri, Sep 16, 2011 at 12:31 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> Hi all,
>
> We've previously discussed adding start/end properties
> to the SeqFeature returning integers - which would be
> useful but inconsistent with the FeatureLocation which
> returns Position objects:
>
> https://redmine.open-bio.org/issues/2818
>
> After an interesting discussion with Leighton, I spent
> the afternoon making (most of the) Position objects
> subclass int - so that they can be used like integers
> (with the fuzzy information retained but generally
> ignored except for writing the features out again).
>
> This means we can have SeqFeature start/end
> properties which like those of the FeatureLocation
> return position objects - and they are actually easy
> to use (except for some very extreme cases).
> e.g. You can use them to slice a sequence.
>
> The code is on a branch here:
> https://github.com/peterjc/biopython/tree/int_pos
>
> It is almost 100% backwards compatible. Some
> of the arguments for creating a fuzzy position
> (and their __repr__) have changed, and some
> of their attributes, but we feel this is unlikely to
> actually affect anyone. We rather suspect only
> the SeqIO parsers actually create or use the
> fuzzy objects in the first place!
>
> In terms of usability I think this is a worthwhile
> improvement. The new class heirachy is a bit
> more complex though - and I have not looked
> at the performance implications at all.
>
> Would anyone like to review this please?
>
>
Here's another way to do it, maybe -- modify Seq.Seq.__getitem__ to also
check if it's been given a SeqFeature, and if so, handle the joins there.
The handling of fuzziness could happen in here or use the new .start and
.end properties.
Outline:
def __getitem__(self, index):
"""Returns a subsequence of single letter, use my_seq[index]."""
if isinstance(index, int):
#Return a single letter as a string
return self._data[index]
elif isinstance(index, SeqFeature):
# NEW -- handle start/end/join voodoo safely
# if there's a join, extract the subsequences and then
concatenate them
return the_result
else:
#Return the (sub)sequence as another Seq object
return Seq(self._data[index], self.alphabet)
Think that would work?
-Eric
More information about the Biopython-dev
mailing list