[Biopython] "raw" genbank locations?

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 10 11:52:48 UTC 2011


On Thu, Mar 10, 2011 at 11:06 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter;
>
>> > do you think this would be useful to
>> > expose as a function of a SeqFeature directly, so you could do
>> > feature.insdc_string() or something similar?
>>
>> A couple of people have asked for this, and since adding SeqIO
>> output in GenBank/EMBL format (the code you refer to in InsdcIO)
>> this would be very possible... the issue holding me back is the
>> annoying special case(s) requiring to know the parent sequence's
>> length. The problem is that currently the SeqFeature doesn't
>> have this information - it doesn't have any link back to a parent
>> SeqRecord (and indeed it doesn't even have to be created in
>> the context of a SeqRecord).
>>
>> Perhaps we can handle the case of between features N^1 on
>> circular sequences of length N differently, maybe with a dedicated
>> SeqFeature location class which would tell us it was at the origin?
>> Then we'd be able to avoid the need to know the parent length.
>
> This is a great idea; makes sense to treat this as a special case
> since that's what it is.

It is probably the most elegant solution without a big refactor.

> Another simple way would be to put the
> function on the SeqRecord class and call it with:
> rec.insdc_feature_string(feature); this places the responsibility of
> knowing the parent back on the library user.

Yes, that would be simple. But don't we sometimes want to use
'orphan' SeqFeature objects (without a SeqRecord parent)?
I'm thinking here about GFF3 files and the like.

>> P.S. If we ever add a CircularSeq object - see other thread- then
>> SeqFeature locations spanning the origin might need reworking
>> too.
>
> Makes sense. We can get the 99% of standard cases working now and
> then re-circle back on this once someone gets up the guts to tackle
> CircularSeq.

:)

Peter



More information about the Biopython mailing list