[Biopython-dev] Getting nucleotide sequence for GenBank features

Peter biopython at maubp.freeserve.co.uk
Wed Oct 28 12:50:55 UTC 2009


On Wed, Oct 28, 2009 at 12:07 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> I think this should be part of Biopython proper (with unit tests etc), and
> would like to discuss where to put it. My ideas include:
>
> (1) Method of the SeqFeature object taking the parent sequence (as a
> string, Seq, ...?) as a required argument. Would return an object of the
> same type as the parent sequence passed in.
>
> (2) Separate function, perhaps in Bio.SeqUtils taking the parent
> sequence (as a string, Seq, ...?) and a SeqFeature object. Would
> return an object of the same type as the parent sequence passed in.
>
> (3) Method of the Seq object taking a SeqFeature, returning a Seq.
> [A drawback is Bio.Seq currently does not depend on Bio.SeqFeature]
>
> (4) Method of the SeqRecord object taking a SeqFeature. Could
> return a SeqRecord using annotation from the SeqFeature. Complex.
>
> Any other ideas?
>
> We could even offer more than one of these approaches, but ideally
> there should be one obvious way for the end user to do this. My
> question is, which is most intuitive? I quite like idea (1).
>
> In terms of code complexity, I expect (1), (2) and (3) to be about the
> same. Building a SeqRecord in (4) is trickier.

Actually, thinking about this over lunch, for many of the use cases
we do want to turn a SeqFeature into a SeqRecord - either for the
nucleotides, or in some cases their translation. And if doing this,
do something sensible with the SeqFeature annotation (qualifiers)
seems generally to be useful. This could still be done with approaches
(1) and (2) as well as (4).

Peter



More information about the Biopython-dev mailing list