[Bioperl-l] GFF file output missing semicolon

Wes Barris wes.barris at csiro.au
Sun Nov 23 19:26:27 EST 2003


Lincoln Stein wrote:

> Hi,
> 
> The GFF2 spec specifies that the semicolon separates tag/value pairs.  It does 
> not say that the last tag/value should be terminated by a semicolon.  It also 
> specifies that any amount of whitespace can occur around the semicolon.

Ok, fair enough.  But then, gbrowse appears to not be able to handle this
format properly.  I know that I must be wrong about this but this is what
I am seeing.

Here is a gff line as created by Bio::Tools::GFF:

AF354168        blast   s-m-100-10      61437   61530   186     -       .
Note "QRNA Feature sheep vs. mouse RNA logoddspost=14.021"   ; Accession
"sheep_#25_61538..61445"

Note that there is a lot of wrapping going on when displayed in this message.

If I load this file (using fast_load_gff.pl) into a mysql database and view
with gbrowse, there are two problems:

1) The accession is displayed above the item inside double quotes like this:
    "sheep_#25_61538..61445".

2) When mousing over the item, neither the accession nor the start and end
    are displayed.  Instead all I see is the track key:
    QRNA Sheep-Mouse 100-10:

If I manually add a semi-colon after the accession at the end of each line
of the gff file and load that into the mysql database, gbrowse proplerly
displays these two items like this:

sheep_#25_61538..61445			(note no double quote marks any more)

QRNA Sheep-Mouse 100-10: sheep_#25_61538..61445 AF354168: 61437..61530

> 
> Lincoln
> 
> On Thursday 20 November 2003 11:19 pm, Wes Barris wrote:
> 
>>Hi,
>>
>>I have written a bioperl program that parses blast files and generates
>>a gff file.  I have everything working except there is one small detail
>>that I have not been able to figure out.  When generating each line
>>of gff output, the semicolon is left off at the end of the Accession
>>name.  Here is a sample line from a gff file that I generated:
>>
>>AF354168        mirseeker       pred_miRNA      188152  188251  198     -  
>>   . Note "mirseeker score 17.58"   ; Accession
>>"s-h_19_r_99330000-99363000"
>>
>>Notice that:
>>
>>1) There are three space characters after the note and the semicolon
>>    that occurs before "Accession".
>>
>>2) At the end of the line, after the Accession, there are three space
>>    characters and no semicolon.  Without that semicolon, the genome
>>    browser doesn't display the "rollover" information properly.
>>
>>3) The "Note" field is written before the "Accession" field.  I thought
>>    that the Accession should come first.
>>
>>Here is the relevant portion of my code:
>>
>>       while( my $hsp = $hit->next_hsp ) {
>>          my $strand = 1;
>>          $strand = -1 if ($hsp->strand('query') == -1 ||
>>$hsp->strand('hit') == -1); my $feature = new Bio::SeqFeature::Generic(
>>                         -source_tag=>$source,
>>                         -primary_tag=>$feature_type,
>>                         -start=>$hsp->start('hit'),
>>                         -end=>$hsp->end('hit'),
>>                         -score=>$hit->raw_score,
>>                         -strand=>$strand,
>>                         -tag=>{
>>                                 Accession=>$result->query_name,
>>                                 Note=>$result->query_description,
>>                                 }
>>                         );
>>          $feature->seq_id($hit->accession);
>>          $gffio->write_feature($feature);       #Bio::SeqFeatureI
>>       }
>>
>>Perhaps I am not adding the "Accession" and "Note" fields properly???
> 
> 


-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au



More information about the Bioperl-l mailing list