[Bioperl-l] hacking BlastTable.pm to support blast -m 8

Tristan Lefebure tristan.lefebure at gmail.com
Tue Oct 27 19:31:03 UTC 2009


Hello,
As far as I understand, Bio::Index::BlastTable only supports
the -m 9 blast format. Another popular and more compact
format is -m 8, the main difference being that the blast
program, the query and database, and the different field
names are not reported between each search, i.e. you get
a much cleaner table (which looks much easier to parse).

By looking at BlastTable.pm, it looks like the main
hack would be in the sub _index_file. Right now it is:

sub _index_file {
        my( $self,
                 $file, # File name
                 $i,    # Index-number of file being indexed
          ) = @_;

        my( $begin,  # Offset from start of file of the start
                          # of the last found record.
          );

        open(my $BLAST, '<', $file) or $self->throw("cannot open file $file\n");
        my $indexpoint = 0;
        my $lastline = 0;
        while( <$BLAST> ) {
                if(m{^#\s+T?BLAST[PNX]} ) {
            my $len = length $_;
            $indexpoint = tell($BLAST)-$len;
                }
        if(m{^#\s+Query:\s+([^\n]+)}) {
            foreach my $id ($self->id_parser()->($1)) {
                                $self->debug("id is $id, begin is $indexpoint\n");
                                $self->add_record($id, $i, $indexpoint);
                        }
        }
        }
}

Using the -m 8 format, is it me or this could be
done by getting the query name from the first row
of the blast table, find when the hits for this query
starts and stop, and give this to add_record()?

I'm kind of not sure to get all the details
regarding the $i and $indexpoint... so well, if an
expert eye could give me some advice or hack the code
that would be nice ;)

--Tristan





More information about the Bioperl-l mailing list