[Bioperl-l] Converting between allowed SearchIO formats?

Dan Bolser dan.bolser at gmail.com
Tue Sep 22 13:09:50 UTC 2009


Hi all,

I'm reading in a blasttable format blast result, filtering, and hoping
to write out a similarly formatted result. Based on experience with
SeqIO, I expected to do something like the following:

use Bio::SearchIO;

## Open the sequence search report
my $seqI = Bio::SearchIO->
  new( -file   => $file,
       -format => $format,
     );

## Open the output report
my $seqO = Bio::SearchIO->
  new( -file   => ">OUTPUT",
       -format => $format,
     );

while( my $result = $seqI->next_result ) {
  ## Do some filtering...

  $seqO->write_result( $result );
}


However, the above method does not work here. Is this for some deep
reason, or could the above method (based on the way SeqIO works) be
made to work? I'm guessing that the SearchIO object conversion is
simply harder to do than with SeqIO?

So now I'm trying to use the correct method, via
Bio::SearchIO::Writer::HSPTableWriter. The problem is, I can't find a
1 to 1 correspondence between the fields in the blasttable and the
columns provided by the writer. So far I have something like this:

blasttable ->		HSPTableWriter

(result) query_name	query_name
(hit) name		hit_name
(hsp) frac_identical	frac_identical_query?
			frac_identical_hit?
(hsp) hsp_length	length_aln_query?
			length_aln_hit?
(?) mismatches		?
(hsp) gaps		?
			gaps_query?
			gaps_hit?
			gaps_total?
(hsp) start('query')	start_query
(hsp) end('query')	end_query
(hsp) start('hit')	start_hit
(hsp) end('hit')	end_hit
(hsp) significance	expect
(hsp) bits		bits


For (hsp) frac_identical, it seems as if the (undocumented)
frac_identical_total column is giving the right value, however, I'ts
hard to be certain because the format is of the value is different
(the blasttable says 93.51 while HSPTableWriter says 0.94). How can I
change the output format of HSPTableWriter?

Is there any improvement on the above mapping? It seems strange that I
can read in a blasttable, but I can't write one out (using a generic
object interface). For example, where do I get the hsp length from
(which column)?

I'm sure this has come up before, so apologies for not being able to
track down the appropriate docs.


Thanks for any help,
Dan.

P.S. when dumping a blasttable from a blasttable using HSP methods,
how should I calculate the number of mismatches? Currently I'm trying:

      my $len = $hsp->length;
      my $match = $len * $hsp->frac_identical;
      my $mismatch = $len - $match;

but the resulting values differ from those in the original blasttable.
I have the feeling this is a FAQ ...



More information about the Bioperl-l mailing list