[Biopython] gff3 problem
Peter Cock
p.j.a.cock at googlemail.com
Tue May 24 07:26:25 EDT 2011
On Fri, May 20, 2011 at 12:27 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Fri, May 20, 2011 at 12:15 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> Peter;
>>
>> [SeqFeature support for not-stranded elements]
>>> So was the consensus that we should reword the Bio.SeqFeature
>>> docstring so say the four valid values for strand are (with GFF3
>>> equivalents in brackets):
>>>
>>> +1 = Forward (+ in GFF3)
>>> -1 = Reverse (- in GFF3)
>>> 0 = Not stranded (. in GFF3)
>>> None = Unknown (? in GFF3)
>>>
>>> And should features on a protein sequence then have strand 0?
>>
>> That sounds great. I can make the corresponding change to the
>> GFF library. Let me know if there are any other roadblocks to
>> integrating that. Thanks much,
>> Brad
Going over this a fresh now, in my email of 20 May, I had mixed up
Leighton's original suggestion. The two special cases (0 and None)
are a bit of a pain:
http://lists.open-bio.org/pipermail/biopython/2011-April/007194.html
Back in April, Leighton wrote:
> The obvious (to me) mapping of the four allowed Biopython symbols to the
> GFF3 convention is:
> +1 -> +
> -1 -> -
> None -> .
> 0 -> ?
> because 'None' is semantically close to 'has no strand information of
> consequence', and 0 is the mean of +1 and -1 ;)
> Cheers,
> L.
i.e.
+1 = Forward (+ in GFF3)
-1 = Reverse (- in GFF3)
0 = Stranded but unknown (? in GFF3)
None = Not stranded (. in GFF3)
SeqFeature docstring updated:
https://github.com/biopython/biopython/commit/ea64c74758dccfc7e6c0940e31a214293ecc59d3
This way proteins features should have strand None (which is what the
current GenBank/EMBL parser does anyway).
Note that the SeqFeature default is strand=None which is still OK.
Mixed strand isn't needed in the GFF3 model, but we already use
None for this. Perhaps it should be 0 rather than None under this model?
Peter
More information about the Biopython
mailing list