[Bioperl-l] hacking BlastTable.pm to support blast -m 8

Chris Fields cjfields at illinois.edu
Tue Oct 27 15:50:24 EDT 2009


On Oct 27, 2009, at 2:31 PM, Tristan Lefebure wrote:

> Hello,
> As far as I understand, Bio::Index::BlastTable only supports
> the -m 9 blast format. Another popular and more compact
> format is -m 8, the main difference being that the blast
> program, the query and database, and the different field
> names are not reported between each search, i.e. you get
> a much cleaner table (which looks much easier to parse).
>
> By looking at BlastTable.pm, it looks like the main
> hack would be in the sub _index_file. Right now it is:
>
> sub _index_file {
>        my( $self,
>                 $file, # File name
>                 $i,    # Index-number of file being indexed
>          ) = @_;
>
>        my( $begin,  # Offset from start of file of the start
>                          # of the last found record.
>          );
>
>        open(my $BLAST, '<', $file) or $self->throw("cannot open file  
> $file\n");
>        my $indexpoint = 0;
>        my $lastline = 0;
>        while( <$BLAST> ) {
>                if(m{^#\s+T?BLAST[PNX]} ) {
>            my $len = length $_;
>            $indexpoint = tell($BLAST)-$len;
>                }
>        if(m{^#\s+Query:\s+([^\n]+)}) {
>            foreach my $id ($self->id_parser()->($1)) {
>                                $self->debug("id is $id, begin is  
> $indexpoint\n");
>                                $self->add_record($id, $i,  
> $indexpoint);
>                        }
>        }
>        }
> }
>
> Using the -m 8 format, is it me or this could be
> done by getting the query name from the first row
> of the blast table, find when the hits for this query
> starts and stop, and give this to add_record()?
>
> I'm kind of not sure to get all the details
> regarding the $i and $indexpoint... so well, if an
> expert eye could give me some advice or hack the code
> that would be nice ;)
>
> --Tristan

That should be feasible, yes, and you are correct.  The main thing to  
make sure of is to retain the '#' for -m9, so the parser catches the  
BLAST executable and other info.

I'll go ahead and do this based on your suggestion, unless you have a  
patch ready.  Also, it looks like the module is missing tests, so I  
can work on adding those for both -m8 and -m9 output.

chris



More information about the Bioperl-l mailing list