[Biopython] gff3 problem

Peter Cock p.j.a.cock at googlemail.com
Fri May 20 11:27:04 UTC 2011


On Fri, May 20, 2011 at 12:15 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter;
>
> [SeqFeature support for not-stranded elements]
>> So was the consensus that we should reword the Bio.SeqFeature
>> docstring so say the four valid values for strand are (with GFF3
>> equivalents in brackets):
>>
>> +1 = Forward (+ in GFF3)
>> -1 = Reverse (- in GFF3)
>> 0 = Not stranded (. in GFF3)
>> None = Unknown (? in GFF3)
>>
>> And should features on a protein sequence then have strand 0?
>
> That sounds great. I can make the corresponding change to the GFF
> library. Let me know if there are any other roadblocks to
> integrating that. Thanks much,
> Brad

I've remembered a corner case, mixed strand features. e.g the
Arabidopsis thaliana chloroplast complete genome, AP000423
in EMBL, NC_000932 in GenBank (one of our unit test files).
e.g. gene with join(complement(69611..69724),139856..140650)

Clearly the child features have well defined strands (+1 and -1).
The parent feature (the join) is mixed strand. Currently our
GenBank parser uses None for this. So maybe:

+1 = Forward (+ in GFF3)
-1 = Reverse (- in GFF3)
0 = Not stranded (. in GFF3)
None = Mixed or unknown (? in GFF3)

Peter



More information about the Biopython mailing list