[Biopython-dev] SeqFeature and FeatureLocation objects (was Bio.GFF)

Peter Cock p.j.a.cock at googlemail.com
Tue Apr 21 13:05:23 UTC 2009


On Tue, Apr 21, 2009 at 1:44 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> You must agree that SeqFeature and FeatureLocation objects are not
>> very lightweight.  I understood that one of your goals with Bio.GFF
>> and map/reduce is to handle massive files, so surely it makes sense to
>> use a simple object structure here?
>
> Unless you are thinking of having an object representation as being too
> heavy, the non-light part of SeqFeature is all the FeatureLocation
> fuzziness.

Fair point.

> I would be for a SeqFeatureLite class that is API compatible with
> SeqFeature (with the new start/end attributes) and does not support
> fuzzy locations. This would handle GFF understandably, be lightweight,
> and allow access to BioSQL and SeqIO. How does this sound?

I have also been thinking about how I would (re)design the SeqFeature
and FeatureLocation objects.  In particular I would want to put the
strand as part of the same object as the location, and also any
join-locations.  I would still want to cope with fuzzy locations, but
make the non-fuzzy approximations more prominent in comparison.  Also,
I really don't like the way joins are currently stored as more
SeqFeatures in the sub_features list (plus this kind of blocks
alternative usage for child/parent nesting that might be nice for GFF
files).

The prime use case to keep in mind is taking a feature location (even
a join), and using this to extract that region of nucleotides from the
parent sequence (i.e. a Seq object or a SeqRecord object, as now both
can be sliced).

Peter




More information about the Biopython-dev mailing list