[Biopython] gff3 problem

Peter Cock p.j.a.cock at googlemail.com
Wed May 18 20:42:02 UTC 2011


On Fri, Apr 8, 2011 at 1:10 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Leighton and Peter;
>
>> > Just to further complicate matters, the symbol convention for GFF3 differs
>> > from Biopython in terms of the categories it defines:
>> > + is positive strand
>> > - is negative strand
>> > . is not stranded (i.e. strand not relevant)
>> > ? is strand relevant, but not known
>> > http://www.sequenceontology.org/gff3.shtml
>
> Yes, although this strikes me a bit like fuzzy features in terms of
> usefulness.
>
>> > The latter two are distinct, but not distinguished by convention in
>> > Biopython:
>> > The obvious (to me) mapping of the four allowed Biopython symbols to the
>> > GFF3 convention is:
>> > +1 -> +
>> > -1 -> -
>> > None -> .
>> > 0 -> ?
>> > because 'None' is semantically close to 'has no strand information of
>> > consequence', and 0 is the mean of +1 and -1 ;)
>
> That's fine by me. Right now both '?' and '.' are converted to None
> so I lose the subtle distinction GFF is introducing:
>
> strand_map = {'+' : 1, '-' : -1, '?' : None, None: None}
>
> If everyone agrees on that coding it's no problem to swap it over.
> Brad

So was the consensus that we should reword the Bio.SeqFeature
docstring so say the four valid values for strand are (with GFF3
equivalents in brackets):

+1 = Forward (+ in GFF3)
-1 = Reverse (- in GFF3)
0 = Not stranded (. in GFF3)
None = Unknown (? in GFF3)

And should features on a protein sequence should then have strand 0?

Peter



More information about the Biopython mailing list