[Bioperl-l] SGD GFF3 file available soon

Stan Dong qdong at genome.stanford.edu
Thu Feb 19 01:42:32 EST 2004


Hi Scott,

In my examples, I use arabic number in the seqid column to indicate
chromosome number. So I should put 'ID=1' in the attribute column of the
first line which represents the whole chromosome. Since these IDs need to
be unique within the scope of the GFF file, I think it's better to  use a
more descriptive name like 'chr01' in this case (and 'ID=chr01' in the
attribute column). 

Thanks a lot for your suggestion,
-Stan


On Wed, 18 Feb 2004, Scott Cain wrote:

> Stan,
> 
> In your sample GFF, the seqid in the first column has to correspond to
> some ID, usually also defined in the same GFF file.  For instance, if
> the features in the GFF file are all on chromosome I, the first column
> of all of those lines would have the same ID as the ID declared for
> chromosome I.  For example:
> 
> I	SGD	chromosome	1	230211	.	.	.	ID=I;description=Sequence "I"
> I	SGD	telomere	1	801	.	-	0	ID=TEL01L;description=I left telomeric region;db_xref=SGD:S0028862
> I	SGD	repeat_family	1	62	.	-	0	ID=TEL01L-TR;name=Telomeric Repeat;description=I left telomere TG(1-3);db_xref=SGD:S0028864
> ...etc...
> 
> Sorry I didn't point that out before--when I looked at the Excel sheet
> you sent me before, I didn't see all of it (I am too used to working
> with plain text files).
> 
> Scott
> 
> -------------Original Message---------------
> > Date: Wed, 18 Feb 2004 14:09:27 -0800
> > From: Stan Dong <qdong at genome.stanford.edu>
> > Subject: [Bioperl-l] SGD GFF3 file available soon
> > To: bioperl-l at bioperl.org
> > Message-ID: <1DE37948-625F-11D8-89C8-000A956A0A36 at genome.stanford.edu>
> > Content-Type: text/plain; charset=US-ASCII; format=flowed
> > 
> > Hi,
> > 
> > I am a programmer at Saccharomyces Genome Database ( SGD, 
> > http://www.yeastgenome.org/ ). I am working on developing a flat file 
> > in GFF3 format ( http://song.sourceforge.net/gff3-jan04.shtml ) to 
> > represent sequence features of yeast genome and it will soon be 
> > released on our ftp site. This is very useful because quite a few open 
> > source softwares can take this file format as input such as Gbrowse, 
> > Chado etc.
> > 
> > I would like comments from people who are interested in doing similar 
> > things and those who have good/not-so-good experience on GFF3 to share 
> > with. For me, it took a while to get the specification done especially 
> > make the third column (type) fully compatible with Sequence Ontology 
> > (SO). One thing I liked about GFF3 is the last column (attributes) 
> > where you can put all kinds of useful information such as in our case 
> > GO annotation and a nice description of a feature. An example file of 
> > SGD GFF3 can be viewed here.
> > 
> > ftp://genome-ftp.stanford.edu/pub/people/curator/GFF3Example.txt
> > 
> > Thanks,
> > 
> > Stan Dong
> > Programmer, SGD
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.org
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 



More information about the Bioperl-l mailing list