[Bioperl-l] Blast to gff

Jason Stajich jason@cgt.mc.duke.edu
Mon, 18 Nov 2002 14:31:07 -0500 (EST)


On Mon, 18 Nov 2002, Wiepert, Mathieu wrote:

> Hi,
>
> What's the simplest way to convert blast hits to gff?  I don't know
> why I thought there was a method somewhere to do this, is there one?
>

This will go in the scripts dir at some point - it doesn't play perfectly
with Bio::DB::GFF because the current loading scheme expects a the
Group,Target,Sequence fields to be the first ones - but since we don't
currently allow one to specify the tag order (my local hacked version of
SeqFeature::Generic does) you won't get the field names just right in
DB::GFF until we update the feature loader scripts in scripts/Bio-DB-GFF.


#!/usr/bin/perl -w
use strict;
use Bio::SearchIO;
use Bio::Tools::GFF;
use Math::BigFloat;
my $filename = shift;
if( $filename =~ /\.gz/ ) {
    open(FH, "zcat $filename |") || die("cannot run zcat on $filename: $!");
} else {
    open(FH, $filename) || die("cannot open $filename: $!");
}
my $in = new Bio::SearchIO(-fh => \*FH,
                           -format => 'blast');

my $out = new Bio::Tools::GFF(-file => ">nr_blastx.gff");
my $evalue_cutoff = Math::BigFloat->new('1e-15');
while( my $r = $in->next_result ) {
    while( my $hit = $r->next_hit ) {
        while( my $hsp = $hit->next_hsp ) {
            my $evalue = $hsp->evalue;
	    # deal with time when NCBI blast doesn't report 1e-200 but e-200
            $evalue =~ s/^e/1e/;
            next if Math::BigFloat->new($evalue) > $evalue_cutoff;
            my $q = $hsp->query;
            $q->source_tag('BLASTX_nr');
            $q->score(sprintf("%.2f",$hsp->percent_identity));
            $q->primary_tag('similarity');

            my ($name) = $hit->name;
            my $oname = '';
            $q->add_tag_value('Target', sprintf("Protein:%s",$hit->accession || $hit->name));
            $q->add_tag_value('Target', $hsp->hit->start);
            $q->add_tag_value('Target', $hsp->hit->end);
#           $q->add_tag_value('Frame', $hsp->frame());
            $q->add_tag_value('note', $hit->description);
            $out->write_feature($q);
        }
    }
}


> -Mat
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu