[Biopython-dev] Bio.GFF and Brad's code

Brad Chapman chapmanb at 50mail.com
Tue Apr 21 12:44:49 UTC 2009


Hi Peter;
[...fuzzy handling...]

> Right - and with the above correction that SeqFeature.start and end
> would be proxies for SeqFeature.location.nofuzzy_start and
> SeqFeature.location.nofuzzy_end, you would get plain integers, and
> this should cover most use cases.  At least for non-Eukaryotes ;)

Yes, that was my proposal. Thanks for fleshing it out and for the
patch.

> > Does solving the start/end problem as described above help bridge the
> > gap between SeqFeatures and the custom representation? Are there other
> > usability issues you found? I would prefer to expose one data structure
> > and think SeqFeature can handle the data well. They scale to nested
> > cases, and will be familiar to those using features in SeqIO or BioSQL.
> 
> You must agree that SeqFeature and FeatureLocation objects are not
> very lightweight.  I understood that one of your goals with Bio.GFF
> and map/reduce is to handle massive files, so surely it makes sense to
> use a simple object structure here?

Unless you are thinking of having an object representation as being too
heavy, the non-light part of SeqFeature is all the FeatureLocation
fuzziness.

I would be for a SeqFeatureLite class that is API compatible with
SeqFeature (with the new start/end attributes) and does not support
fuzzy locations. This would handle GFF understandably, be lightweight,
and allow access to BioSQL and SeqIO. How does this sound?

Brad



More information about the Biopython-dev mailing list