[Biopython-dev] SeqFeature and FeatureLocation objects (was Bio.GFF)
Peter Cock
p.j.a.cock at googlemail.com
Tue Apr 21 13:05:23 UTC 2009
On Tue, Apr 21, 2009 at 1:44 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> You must agree that SeqFeature and FeatureLocation objects are not
>> very lightweight. I understood that one of your goals with Bio.GFF
>> and map/reduce is to handle massive files, so surely it makes sense to
>> use a simple object structure here?
>
> Unless you are thinking of having an object representation as being too
> heavy, the non-light part of SeqFeature is all the FeatureLocation
> fuzziness.
Fair point.
> I would be for a SeqFeatureLite class that is API compatible with
> SeqFeature (with the new start/end attributes) and does not support
> fuzzy locations. This would handle GFF understandably, be lightweight,
> and allow access to BioSQL and SeqIO. How does this sound?
I have also been thinking about how I would (re)design the SeqFeature
and FeatureLocation objects. In particular I would want to put the
strand as part of the same object as the location, and also any
join-locations. I would still want to cope with fuzzy locations, but
make the non-fuzzy approximations more prominent in comparison. Also,
I really don't like the way joins are currently stored as more
SeqFeatures in the sub_features list (plus this kind of blocks
alternative usage for child/parent nesting that might be nice for GFF
files).
The prime use case to keep in mind is taking a feature location (even
a join), and using this to extract that region of nucleotides from the
parent sequence (i.e. a Seq object or a SeqRecord object, as now both
can be sliced).
Peter
More information about the Biopython-dev
mailing list