[Biopython] Fwd: gff3 problem

Leighton Pritchard Leighton.Pritchard at hutton.ac.uk
Fri Apr 8 10:14:17 UTC 2011


Re-sent due to email address change (and subsequent bounce)

Begin forwarded message:

Date: 8 April 2011 10:46:49 GMT+01:00
To: Peter Cock <p.j.a.cock at googlemail.com<mailto:p.j.a.cock at googlemail.com>>
Cc: Michal <mictadlo at gmail.com<mailto:mictadlo at gmail.com>>, <biopython at lists.open-bio.org<mailto:biopython at lists.open-bio.org>>
Subject: Re: [Biopython] gff3 problem

Hi,

Just to further complicate matters, the symbol convention for GFF3 differs from Biopython in terms of the categories it defines:

+ is positive strand
- is negative strand
. is not stranded (i.e. strand not relevant)
? is strand relevant, but not known

http://www.sequenceontology.org/gff3.shtml

The latter two are distinct, but not distinguished by convention in Biopython:

"""
61 o strand - A value specifying on which strand (of a DNA sequence, for 62 instance) the feature deals with. 1 indicates the plus strand, -1 63 indicates the minus strand, 0 indicates both strands, and None indicates 64 that strand doesn't apply (ie. for proteins) or is not known.
"""

(http://www.biopython.org/DIST/docs/api/Bio.SeqFeature-pysrc.html)

Biopython lacks a symbol or convention for representation of "strand relevant, but not known".  The 0 and None classifications are, at least partly, redundant because there are (as a rule) only two strands, and if a feature covers both strands (class 0) then the question of strandedness is irrelevant (class None).  That feature's strand could then happily be described by either 0 or None.

The obvious (to me) mapping of the four allowed Biopython symbols to the GFF3 convention is:

+1 -> +
-1 -> -
None -> .
0 -> ?

because 'None' is semantically close to 'has no strand information of consequence', and 0 is the mean of +1 and -1 ;)

Cheers,

L.

On 8 Apr 2011, at Friday, April 8, 09:49, Peter Cock wrote:

On Fri, Apr 8, 2011 at 7:35 AM, Michal <mictadlo at gmail.com<mailto:mictadlo at gmail.com>> wrote:

How could I get also the strand position?

Every SeqFeature should have a strand attribute, which
will be +1 or -1 where there is a strand. GFF features
can also be strandless, not applicable unknown, in which
case the SeqFeature strand should 0 or None.

Peter
_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org<mailto:Biopython at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/biopython

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

--
Dr Leighton Pritchard MRSC
DG31, Plant Pathology Programme, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827



--
Dr Leighton Pritchard MRSC
DG31, Plant Pathology Programme, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


_______________________________________________________________

This email is from The James Hutton Institute (JHI), however the views expressed by the sender are not necessarily the views of JHI and its subsidiaries. This email and any attachments are confidential and are intended solely for the use of the recipient(s) to whom they are addressed. If you are not the intended recipient, you should not read, copy, disclose or rely on any information contained in this email, and we would ask you to contact the sender immediately and delete the email from your system. Although JHI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Edinburgh No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796



More information about the Biopython mailing list