[Bioperl-l] Re: Converting GFF2 records to GFF3

Scott Cain cain at cshl.org
Tue Dec 28 14:20:06 EST 2004


Hi Razi,

I think the spec is pretty clear on the question of quotes--they are to
be escaped.  Unfortunately, Bio::Tools::GFF is not a great module and we
are moving away from using it in favor of FeatureIO modules (of which,
gff.pm is one).  I've commented out the line that adds quotes to the
values in BTGFF.  Quotes are used in GFF2 to group together text that
has spaces in them, since space is used as a delimiter in the ninth
column of GFF2.  In GFF3, delimiting is much clearer: everything between
the '=' and either ';' (single value) or ',' (for a list) is the value.

As for your second question, I'm not sure what the answer is without an
example, but I suspect it is related to a boolean property, and if so,
it should have a value of one, for example, "is_current=1;"

Finally, note that the scripts you wrote won't work generally for at
least a few reasons, though they may in your case.  Use extreme caution,
though, because your script doesn't verify that the feature type is part
of SOFA, doesn't deal with parent-child relationships, and doesn't
guarantee the the rules are followed for reserved tag names.  I've found
that when I want to convert GFF2 to GFF3, it is a partially manual
process, where I can script the easy things and then fix problems by
hand.

Good luck!
Scott


On Tue, 2004-12-28 at 13:22 -0500, bioperl-l-request at portal.open-bio.org
wrote:
> Date: Thu, 23 Dec 2004 15:54:40 -0500 (EST)
> From: Razi Khaja <razi at genet.sickkids.on.ca>
> Subject: [Bioperl-l] Converting GFF2 records to GFF3
> To: song-devel at lists.sourceforge.org, bioperl <bioperl-l at bioperl.org>
> Message-ID: <20041223205440.84719.qmail at web51606.mail.yahoo.com>
> Content-Type: text/plain; charset=us-ascii
> 
> Sorry for cross posting, but this may be relevent to both bioperl and song-devel.
>  
> Ive written a small script to convert gff2 records to gff3 using bioperl and vice versa (see gff2_to_gff3.pl and gff3_to_gff2.pl below).  
>  
> In doing this I have noticed some problems in conversion.
>  
> The method Bio::Tools::GFF::_gff3_string will quote attribute values if they contain characters not in [a-zA-Z0-9,;=.:%^*$@!+_?-] (ie. $value = '"'.$value.'"';) and will output empty quotes for tags without values (ie. $value = "\"\"";).
>  
> Currently the gff3 spec says: "Unescaped quotation marks, ... are explicitly forbidden." 
>  
> This brings up 2 questions:
> (1) Are quotes necessary in gff3?
> (2) When a value is empty, what should be output?
>     a) Tag="";
>     b) Tag=.;
>     c) Tag=;
>     d) nothing?
>  
> (Apart from not meeting the spec, this makes it difficult to do transformations from gff2 to gff3 and back to gff2 again.)
> 
>  
> 
> 
> # =====  gff2_to_gff3.pl =====
> #!/usr/bin/perl
> use strict;
> use Bio::Tools::GFF;
> my( $gff2File ) = @ARGV;
> my $gffio = Bio::Tools::GFF->new(-file=>"$gff2File", 
> -gff_version=>2);
> while( my $feature = $gffio->next_feature() ) {
>     my $gff3string = $gffio->_gff3_string( $feature );
>     print "$gff3string\n";
> }
> $gffio->close();
> 
>  
> 
> # =====  gff3_to_gff2.pl =====
> 
> #!/usr/bin/perl
> use strict;
> use Bio::Tools::GFF;
> my( $gff3File ) = @ARGV;
> my $gffio = Bio::Tools::GFF->new(-file=>"$gff3File", -gff_version=>3);
> while( my $feature = $gffio->next_feature() ) {
>     my $gff2string = $gffio->_gff2_string( $feature );
>     print "$gff2string\n";
> }
> $gffio->close();
> 
>  
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list