[Bioperl-l] How to express 'histogram' data in GFF3

Scott Cain cain at cshl.edu
Sat Mar 19 09:23:54 EST 2005


Matt,

First, let me start by saying there are some "unexplored" areas of GFF3
and GBrowse (at least, they are unexplored for me).

While I haven't test this, it should work fine.  What you can do is
create one parent feature that encapsulates the entire range, and then
have the data points be lines of the parent:

ChrII	rev1	region	1	2000000	.	.	.	ID=poly1%3AChrII%3Arev1
ChrII	rev1	poly1	1591004	1591068	464.835	-	.	Parent=poly1%3AChrII%3Arev1

Now whether this will work in a GFF database with GBrowse currently is
an open question (like I said, I haven't tested it); I know it would
work in a chado database and GBrowse.  You might need a custom
aggregator to make it work in a GFF database.

On the other hand, I'm not convinced that having all the lines with the
same ID violates the GFF3 spec, as you could probably view this as one
big feature of the whole range, and therefore the ID applies to that one
feature, not to the individual pieces that make of the lines of GFF.  If
you want, you can send me a small sample set of data and I'll see what I
can do.

Scott


On Fri, 2005-03-18 at 06:55 -0500, Matthew Vaughn wrote: 
> OK, I've bashed my head against this and have come up short, so now I'm 
> asking for help. Recently, I decided to upgrade my development system 
> to BioPerl 1.5 and bring all my code up to GFF3 compliance. This of 
> course, includes code that generates GFF files for loading into our 
> local Generic Genome Browser (1.62).
> 
> The problem comes when I try to express histogram data. In the past, 
> rows like this worked fine as GFF2
> 
> "ChrII	rev1	poly1	1591004	1591068	464.835	-	.	poly1 ChrII:rev1"
> 
> but this is invalid for GFF3. As far as I can figure from interpreting 
> the GFF3 spec, the same record should look something like this
> 
> "ChrII	rev1	poly1	1591004	1591068	464.835	-	.	ID=poly1%3AChrII%3Arev1"
> 
> But this violates the GFF3 spec in that ID is now non-unique. Rows 
> formatted thusly also fail to display any histogram data in my browser.
> 
> I've considered loading the array data as GFF2 and my annotation data 
> as GFF3, but that seems, well, inelegant (plus I don't even know if 
> that will work)
> 
> Any input will be very much appreciated!
> 
> Matt
> 
> --
> Matthew W. Vaughn, Ph.D.
> Cold Spring Harbor Laboratory
> Delbruck Laboratory / Martienssen Group
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> 
> phone: (516) 367-8469
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list