[Biopython] gff3 problem
Peter Cock
p.j.a.cock at googlemail.com
Fri Apr 8 09:54:05 UTC 2011
On Fri, Apr 8, 2011 at 10:46 AM, Leighton Pritchard wrote:
> Hi,
> Just to further complicate matters, the symbol convention for GFF3 differs
> from Biopython in terms of the categories it defines:
> + is positive strand
> - is negative strand
> . is not stranded (i.e. strand not relevant)
> ? is strand relevant, but not known
> http://www.sequenceontology.org/gff3.shtml
> The latter two are distinct, but not distinguished by convention in
> Biopython:
> """
> 61 o strand - A value specifying on which strand (of a DNA sequence, for 62
> instance) the feature deals with. 1 indicates the plus strand, -1 63
> indicates the minus strand, 0 indicates both strands, and None indicates 64
> that strand doesn't apply (ie. for proteins) or is not known.
> """
> (http://www.biopython.org/DIST/docs/api/Bio.SeqFeature-pysrc.html)
> Biopython lacks a symbol or convention for representation of "strand
> relevant, but not known". The 0 and None classifications are, at least
> partly, redundant because there are (as a rule) only two strands, and if a
> feature covers both strands (class 0) then the question of strandedness is
> irrelevant (class None). That feature's strand could then happily be
> described by either 0 or None.
Indeed.
> The obvious (to me) mapping of the four allowed Biopython symbols to the
> GFF3 convention is:
> +1 -> +
> -1 -> -
> None -> .
> 0 -> ?
> because 'None' is semantically close to 'has no strand information of
> consequence', and 0 is the mean of +1 and -1 ;)
> Cheers,
> L.
And we can maintain the stated convention in the docstring that
features on a protein sequence have None as their strand.
Other than GFF (which isn't in Biopython yet), I don't think we have
any feature code that really cares about this - GenBank/EMBL files
don't make this distinction at all.
So, as far of integrating Brad's GFF code into Biopython, we
can tighten up the rather loose strand convention in the docstring
for the SeqFeature.
Peter
More information about the Biopython
mailing list