[Bioperl-l] Parse BLAST to find mismatches

Jason Stajich jason@cgt.mc.duke.edu
Sun, 26 May 2002 23:37:39 -0400 (EDT)


On Fri, 24 May 2002, Ken Graham wrote:

> I'm stuck on what might be a simple problem. Using a BLAST report I want to
> find the position, on the query sequence, of each mismatch in a HSP. I can
> read in the BLAST report, and get properties on the HSP (such as start,
> percent similarity etc..) but I don't know where to find any info about the
> mismatches. I tried to use the SimpleAlign object but I just get errors.

You can call $hsp->homology_seq to determine where the mismatches are.

The mismatches will be space so you can use the index method to locate
where the mismatches are.  The following might get you what you want.

my $in = new Bio::SearchIO(-file => 'BLASTFILE',
			   -format => 'blast');

while( my $result = $in->next_result ) {
    while( my $hit = $result->next_hit ) {
	while( my $hsp = $hit->next_hsp ) {
	    my $hseq = $hsp->homology_string;
	    $last = 0;
	    my @a;
	    while( ($last = index($hseq, ' ', $last)) > 0 ) {
		# last is the location of the mismatch
	 	# NOTE: $last is index in string coordinates
		# but Bio::Seq objects start with 1 not 0.
		# What you'll probably want to do is:
		print "base is ", substr($hsp->query_string, $last,1),
		      " index is $last\n";
		$last++;
	    }
	    print $hit->accession, " ", join(",",@a), "\n" if(@a);
	}
    }
    last;
}

> Am I missing something obvious?
> Thanks
>
> Ken Graham
> UC Davis School of Medicine
> Biological Chemistry
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu