[Biopython] gff3 problem

Chris Fields cjfields at illinois.edu
Fri May 20 13:24:30 UTC 2011


On May 20, 2011, at 6:27 AM, Peter Cock wrote:

> On Fri, May 20, 2011 at 12:15 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> Peter;
>> 
>> [SeqFeature support for not-stranded elements]
>>> So was the consensus that we should reword the Bio.SeqFeature
>>> docstring so say the four valid values for strand are (with GFF3
>>> equivalents in brackets):
>>> 
>>> +1 = Forward (+ in GFF3)
>>> -1 = Reverse (- in GFF3)
>>> 0 = Not stranded (. in GFF3)
>>> None = Unknown (? in GFF3)
>>> 
>>> And should features on a protein sequence then have strand 0?
>> 
>> That sounds great. I can make the corresponding change to the GFF
>> library. Let me know if there are any other roadblocks to
>> integrating that. Thanks much,
>> Brad
> 
> I've remembered a corner case, mixed strand features. e.g the
> Arabidopsis thaliana chloroplast complete genome, AP000423
> in EMBL, NC_000932 in GenBank (one of our unit test files).
> e.g. gene with join(complement(69611..69724),139856..140650)
> 
> Clearly the child features have well defined strands (+1 and -1).
> The parent feature (the join) is mixed strand. Currently our
> GenBank parser uses None for this. So maybe:
> 
> +1 = Forward (+ in GFF3)
> -1 = Reverse (- in GFF3)
> 0 = Not stranded (. in GFF3)
> None = Mixed or unknown (? in GFF3)
> 
> Peter

That's essentially what bioperl does for 'split' locations (actually, I think it is just undef, which would translate to '?' for GFF3).

chris







More information about the Biopython mailing list