[Biopython-dev] Replacing SeqFeature sub_features with compound feature locations

Peter Cock p.j.a.cock at googlemail.com
Mon Jul 23 16:02:45 UTC 2012

On Mon, Jul 23, 2012 at 2:05 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter;
> Thanks for working through the sub_feature issue and coming up with this
> proposal. I'm 100% on board with converting over to something more
> general and this looks like a great approach.
> A couple of quick thoughts:
> - Would it be possible to have a back-compatible 'sub_features' that
>   reconstituted features based on the compound location? This could help
>   us avoid breaking scripts that use sub_features, even if we no longer
>   fill those in going forward.

When you say 'use' do you mean populate and modify? Use in the
read-only sense is already covered - in that any Biopython code
generating complex SeqFeature objects would (in the short term)
populate both the sub_feature AND the new compound location.

Things get very hairy if we want to support edits to the sub_features
also automatically updating the new compound location (and vice
versa). So I don't want to do that.

> - How do you envision storing GFF feature hierarchies? The location
>   object is more lightweight with only position and strand information.

Only in the simple cases. In addition to single line GFF features, you
have joins expressed by multiple GFF lines with a common ID. Also,
it seems quite possible that GFF3 will add a new tag entry to describe
fuzzy locations in future, see e.g. this thread

>   Nested child GFF features would have key/value pairs associated with
>   them as well. Would we want to use sub_features (or some new nested
>   structure) for these?

Absolutely a new nested structure - reusing sub_features would
just cause too much confusion. This might be done with a parent
attribute and/or a children list - perhaps with weak references to
avoid garbage collection problems with freeing memory.


More information about the Biopython-dev mailing list