[SO-devel] Re: [Bioperl-l] GFF3 preliminary

Jim Kent jim_kent at pacbell.net
Thu Feb 20 08:56:19 EST 2003


One advantage of the explicit phase is that then you can annotate
genomes with sequence gaps.   We went without explicit exon-by-exon
phase information in the UCSC database,  and it has at times
made life difficult.

----- Original Message -----
From: "Richard Durbin" <rd at sanger.ac.uk>
To: <song-devel at lists.sourceforge.net>
Cc: "Ian Korf" <ik1 at sanger.ac.uk>; <gff-list at sanger.ac.uk>;
<michele at sanger.ac.uk>; <bioperl-l at bioperl.org>
Sent: Thursday, February 20, 2003 2:11 AM
Subject: Re: [SO-devel] Re: [Bioperl-l] GFF3 preliminary


> This is specifically about phase.  I have other comments on the full
> document, which is approaching convergence I think.  Sorry for the delay
> in responding.
>
> I can't understand why the definition of phase in the current spec is
> unclear.  It says:
>
>    <DEFANGED_frame>  One of '0', '1', '2' or '.'. '0' indicates that the
specified
>    region is in frame, i.e. that its first base corresponds to
>    the first base of a codon. '1' indicates that there is one
>    extra base, i.e. that the second base of the region
>    corresponds to the first base of a codon, and '2' means that
>    the third base of the region is the first base of a codon. If
>    the strand is '-', then the first base of the region is value
>    of <end>, because the corresponding coding region will run
>    from <end> to <start> on the reverse strand. As with
>    <strand>, if the frame is not relevant then set <DEFANGED_frame> to
>    '.'. It has been pointed out that "phase" might be a better
>    descriptor than "frame" for this field. Version 2 change:
>    This field is left empty '.' for RNA and protein features.
>
> (Yes, I know we called it frame when it should be phase.  I completely
> support GFF3 calling this field "phase".)
>
> Anyway, we (primarily David Haussler and I) specifically addressed the
> reverse strand, and it is the opposite of what Ian says.  The phase is
> strand symmetric - it is always about the 5' end.  I guess we should
> have said it that way.  Anyway, please feel free to rewrite.  I think
> it is very important to keep the phase column.  It is relevant for
> similarities and partial genewise matches etc. as well as full coding
> sequences.  I don't support the view that the phase of a GFF line should
> be calculated implicitly from the presence of coding_start features in
> other lines that may or may not be properly linked with this one.  Lines
> should be as independent as possible.
>
> So I strongly disgaree with Lincoln's suggestion.  GTF as formalised by
> Michael Brent et al. only got away with forgetting because it was used
> for a restricted set of purposes.
>
> Richard
>
> Lincoln Stein wrote:
> > How about mandating a "." in the phase column and insisting that the
open
> > reading frame model be expressed in terms of a transcript, a set of
exons, a
> > coding start and a coding end?  This way the phase can be calculated
> > correctly while loading the GFF file (by those people who think in terms
of
> > phase), or even added by a generic GFF processing script.
> >
> > Lincoln
> >
> >
> > On Wednesday 19 February 2003 02:49 pm, Ian Korf wrote:
> >
> >>Historically, the definition for phase has been well-defined but usually
> >>not followed. One person's frame is another person's phase and in the
end
> >>people just give up and put a "." there. The current definition does not
> >>resolve this problem, and people will continue to be mystified.
> >>
> >>The problem may be that the definition is not strand-symmetric. A phase
1
> >>exon on the plus strand indicates that the 5' end has an impartial
codon,
> >>but says nothing about the 3' end. On the minus strand, a phase 1 exon
> >>indicates that the 3' end is impartial but says nothing about the 5'
end.
> >>The opposite end and the frame can easily be worked out from the
> >>coordinates. But maybe it would be better to make phase and frame
explict.
> >>How about representing them as a triplet of phase;frame;phase. This may
be
> >>redundant, but it's clearer. For example, "0;1;1" indicates that the
first
> >>base is a the start of a complete codon, it is in frame 1, and there is
1
> >>base of a partial codon at the end. Here, start and end refer to the
> >>absolute coordinates, but I'd be just as happy to make them 5' and 3'.
> >>
> >>-Ian
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>-------------------------------------------------------
> >>This SF.net email is sponsored by: SlickEdit Inc. Develop an edge.
> >>The most comprehensive and flexible code editor you can use.
> >>Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial.
> >>www.slickedit.com/sourceforge
> >>_______________________________________________
> >>SOng-devel mailing list
> >>SOng-devel at lists.sourceforge.net
> >>https://lists.sourceforge.net/lists/listinfo/song-devel
> >
> >
>
>



More information about the Bioperl-l mailing list