[Bioperl-l] [Gmod-schema] Circular genomes in Chado/BioPerl

Chris Mungall cjm at berkeleybop.org
Tue Sep 9 22:56:55 UTC 2008


I think I am happy with the modulo approach.

Though I believe we first of all need for a formal specification of  
genome interval semantics that is independent of any particular syntax  
or implementation. This can be a fairly short specification - along  
the lines of what Lincoln has written below (although I would  
naturally prefer the normative version to be interbase - this doesn't  
preclude derived axioms in GFF coordinates).

This spec should also define and standardize the terminology used:  
Lincoln draws a distinction between 'stop' and 'end'. I'm relatively  
happy with these terms - however, the choice we makes need to become  
enshrined otherwise we'll end up with confusion and mismatches between  
software and specification.

One clarification:

> revolutions = int (length/genome)


This axiom is presumaby contextual on the genome being circular, which  
will have to be indicated using a new flag, as Jim suggest, yep?

So the context independent axiom would be:

> revolutions = IF src_is_circular THEN int (length/genome) ELSE 0


On Sep 9, 2008, at 10:52 AM, Lincoln Stein wrote:

> It seems to me that the proposed modulus syntax handles multiple
> revolutions. Consider a 100 bp genome (to make it simple) and a  
> feature that
> starts at 50, goes around twice, and ends at position 60:
>
>  start = 50
>  end  = 260
>
> length = end - start + 1
> revolutions = int (length/genome)
> stop position = length % genome + 1
>
> Lincoln
>
> On Mon, Sep 8, 2008 at 3:57 PM, Aaron Mackey <ajmackey at gmail.com>  
> wrote:
>
>> How can you handle features that may cross the origin more than once?
>> The modulus, though simple, seems to be only half the solution.  It
>> also makes it difficult to place features in the genome "by eye"
>> (having to do the modulus subtraction in my head), or in
>> sorting/filtering operations.
>>
>> I have an alternative that I wondered if you considered: allow the
>> start/end to have an additional "circular revolution" prefix:
>>
>> a typical range tuple like: 100 200 -
>> is thus shorthand for: 0:100 0:200 -
>> (i.e. both the 100 and 200 are in the same "revolution" around the  
>> genome)
>>
>> and is then distinguishable from an "around the genome + 100"  
>> feature of:
>> 1:100 0:200 -
>>
>> Just an alternative to consider (if you haven't already).  I'm not
>> wedded to the syntax, but I wouldn't want to see new columns in GFF
>> just for this.  Essentially, what you want is some form of compound
>> polar coordinates, it seems.
>>
>> -Aaron
>>
>> On Mon, Sep 8, 2008 at 2:44 PM, Jim Hu <jimhu at tamu.edu> wrote:
>>> In discussions with GMOD about Gbrowse, we've come up with a  
>>> proposal for
>>> handling circular genomes and features that cross the origin in such
>>> genomes.  This applies to lots of prokaryotic and viral genomes, and
>> might
>>> be valuable for some ways of representing terminally redundant  
>>> linear
>>> genomes.
>>> 1) Keep the requirement that start < end
>>> 2) allow end > parent feature length
>>> 3) parent feature gets an is_circular boolean
>>> 4) use modular arithmetic to calculate the real position of end on  
>>> the
>>> parent feature.
>>> We'd like to do this in a way that will be consistent with Chado and
>> BioPerl
>>> representation of features as much as possible (realizing that  
>>> there is
>> the
>>> usual interbase or not coordinate issue).  What do people think?   
>>> Lincoln
>> is
>>> on board for modifying the GFF3 spec.
>>> Thanks!
>>> Jim Hu
>>>
>>> =====================================
>>>
>>> Jim Hu
>>>
>>> Associate Professor
>>>
>>> Dept. of Biochemistry and Biophysics
>>>
>>> 2128 TAMU
>>>
>>> Texas A&M Univ.
>>>
>>> College Station, TX 77843-2128
>>>
>>> 979-862-4054
>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win  
>>> great
>>> prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in  
>>> the
>> world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> Gmod-schema at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> -- 
> Lincoln D. Stein
>
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Stacey Quinn <Stacey.Quinn at oicr.on.ca>
>
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724 USA
> (516) 367-8380
> Assistant: Sandra Michelsen <michelse at cshl.edu>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>




More information about the Bioperl-l mailing list