[Biopython-dev] Moving strand & db ref from SeqFeature to FeatureLocation

Peter Cock p.j.a.cock at googlemail.com
Mon Oct 10 21:47:03 UTC 2011


This was on the "SeqFeature start/end and making positions act
like ints" thread last month:
http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009183.html

On Mon, Sep 19, 2011 at 10:03 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Well, slightly easier - I have some more dramatic changes to
>> the SeqFeature and FeatureLocation objects planned, but I'm
>> still playing with this.
>
> One of the key changes (which can be done without
> really changing the API) is to move the database &
> accession and the strand from the SeqFeature to the
> FeatureLocation. These are intimately connected with
> the location, as much as the start/end.
>
> This is one of the things I've been working on here:
> https://github.com/peterjc/biopython/commits/f_loc
>
> The other key change on that experimental branch
> is moving away from sub_features for join locations
> (etc). Here I was trying a new CoupoundLocation
> object, but am still wondering if this should be done
> in the SeqFeature or FeatureLocation object instead
> (or if SeqFeature should subclass FeatureLocation).
>
> Peter

That branch needs some manual merge conflict
resolution with the integer subclassing position
changes that landed on the trunk, which I've started:

https://github.com/peterjc/biopython/tree/f_loc2

Would someone like to review that please?

It moves the strand, ref and db_ref properties from
the SeqFeature object to the FeatureLocation object,
implementing read/write proxy methods for backward
compatibility.

Other than the commit which changes the __str__
method (the fine details of which I am happy to tweak
with discussion) this should be almost 100% back
compatible:

https://github.com/peterjc/biopython/commit/fed003821d0d223a7b3042ccc3bdf8442348f043

The one break I am aware of is you can't now create a
SeqFeature with an empty location and then try to set
the strand or db regs before setting the location object.
(which is what the GenBank parser was doing).

The motivation is that the strand and (optional) database
reference for which the location start/end apply are both
essential parts of the location information, and I feel never
should have been attached to the SeqFeature directly.

Furthermore, this separation is useful as a step towards
reworking the current use of the SeqFeature's sub_feature
list for multi-part locations (e.g. joins in GenBank/EMBL),
more on this later.

Thanks,

Peter



More information about the Biopython-dev mailing list