[Biopython] "raw" genbank locations?

Brad Chapman chapmanb at 50mail.com
Thu Mar 10 11:06:48 UTC 2011


Peter;

> > do you think this would be useful to
> > expose as a function of a SeqFeature directly, so you could do
> > feature.insdc_string() or something similar?
> 
> A couple of people have asked for this, and since adding SeqIO
> output in GenBank/EMBL format (the code you refer to in InsdcIO)
> this would be very possible... the issue holding me back is the
> annoying special case(s) requiring to know the parent sequence's
> length. The problem is that currently the SeqFeature doesn't
> have this information - it doesn't have any link back to a parent
> SeqRecord (and indeed it doesn't even have to be created in
> the context of a SeqRecord).
> 
> Perhaps we can handle the case of between features N^1 on
> circular sequences of length N differently, maybe with a dedicated
> SeqFeature location class which would tell us it was at the origin?
> Then we'd be able to avoid the need to know the parent length.

This is a great idea; makes sense to treat this as a special case
since that's what it is. Another simple way would be to put the
function on the SeqRecord class and call it with:
rec.insdc_feature_string(feature); this places the responsibility of
knowing the parent back on the library user. 

> P.S. If we ever add a CircularSeq object - see other thread- then
> SeqFeature locations spanning the origin might need reworking
> too.

Makes sense. We can get the 99% of standard cases working now and
then re-circle back on this once someone gets up the guts to tackle
CircularSeq.

Brad



More information about the Biopython mailing list