[Bioperl-l] Bio::Tools::Fgenesh bug? and fix?

Cook, Malcolm MEC at stowers-institute.org
Tue Jul 11 00:25:11 UTC 2006


I am finding the Bio::Tools::Fgenesh parser to incorrectly handle the
feature coordinates on - strand predictions.

In particular, start & end are deliberately reversed if the strand is
'-'.

I guess this was a holdover from Genscan.pm and wasn't really tested
!?!?!

Or, perhaps fgenesh v 2.4 which I am running has different output in
this respect compared to the version 2.0, against which this module was
written.

Or, perhaps my understanding is blotto (known to happen).

Does anyone know for sure?

If I comment out selected lines...

#	    if($predobj->strand() == 1) {
		$predobj->start($start);
		$predobj->end($end);
#	    } else {
#		$predobj->end($start);
#		$predobj->start($end);
#	    }

... then GFF produced by my naive fgenesh2gff script below is correct
(at least w.r.t. strand and coordinates - GFF compatibility purists
might wince).

Should I commit this change to head?


Malcolm Cook
Database Applications Manager, Bioinformatics
Stowers Institute for Medical Research 



#!/usr/bin/env perl

# fgenesh2gff 
# PURPOSE: parse fgenesh output into gff
# USAGE: fgenesh fish somefish.dna | fgenesh2gff >
somefish.dna.fgenesh.gff

use strict;
use warnings;
use Bio::Tools::Fgenesh;	
use Bio::FeatureIO;

# Remaining options should name files to process, but if none, process
# standard input:
@ARGV = ('-') unless @ARGV;	
my $fgenesh = Bio::Tools::Fgenesh->new(-fh => \*ARGV);

my $featureout =   new Bio::Tools::GFF(
				       -gff_version => 2, #whatever ;)
				      );
my $IDNUM = 0;
while (my $gene = $fgenesh->next_prediction()) {
  my $ID =  "fgenesh" . ++ $IDNUM;
  $gene->add_tag_value('ID', $ID);
  $featureout->write_feature($gene);
  foreach ($gene->exons()) {
    $_->add_tag_value('Parent', $ID);
    $_->seq_id($gene->seq_id);
    $featureout->write_feature($_);
  }
}
$fgenesh->close();

exit 0;





More information about the Bioperl-l mailing list