[Biopython] "raw" genbank locations?
Brad Chapman
chapmanb at 50mail.com
Thu Mar 10 11:06:48 UTC 2011
Peter;
> > do you think this would be useful to
> > expose as a function of a SeqFeature directly, so you could do
> > feature.insdc_string() or something similar?
>
> A couple of people have asked for this, and since adding SeqIO
> output in GenBank/EMBL format (the code you refer to in InsdcIO)
> this would be very possible... the issue holding me back is the
> annoying special case(s) requiring to know the parent sequence's
> length. The problem is that currently the SeqFeature doesn't
> have this information - it doesn't have any link back to a parent
> SeqRecord (and indeed it doesn't even have to be created in
> the context of a SeqRecord).
>
> Perhaps we can handle the case of between features N^1 on
> circular sequences of length N differently, maybe with a dedicated
> SeqFeature location class which would tell us it was at the origin?
> Then we'd be able to avoid the need to know the parent length.
This is a great idea; makes sense to treat this as a special case
since that's what it is. Another simple way would be to put the
function on the SeqRecord class and call it with:
rec.insdc_feature_string(feature); this places the responsibility of
knowing the parent back on the library user.
> P.S. If we ever add a CircularSeq object - see other thread- then
> SeqFeature locations spanning the origin might need reworking
> too.
Makes sense. We can get the 99% of standard cases working now and
then re-circle back on this once someone gets up the guts to tackle
CircularSeq.
Brad
More information about the Biopython
mailing list