[Bioperl-l] parsing a BLAST output

Jason Stajich jason.stajich at duke.edu
Sun Dec 4 23:00:06 EST 2005


frac_identical gives the fraction of bases in the HSP that are  
identical.

overlap would be calculated from (length of the query aligned) /  
(length of query) or (length of hit aligned) / (length of hit).

So for an HSP you can calculate this
  $fracqaligned = $hsp->query->length / $result->query_length;
  $frachaligned = $hsp->hit->length / $hit->length;

But remember there may be multiple HSPs so you may need to merge the  
HSP information to get the total length aligned.  If there is a  
repeated domain or multiple high scoring suboptimal alignments this  
can cause things to get a little tricky.

There are two methods provided in the HitI interface called
frac_aligned_query() and frac_aligned_hit() do try and take all of  
this into account for you, but I admit this is less well tested  
code.  But do give them a try:
  $fracqaligned = $hit->frac_aligned_query();
  $frachaligned = $hit->frac_aligned_hit();


If you are using WU-BLASTP add the -postsw option to get a refined  
alignment which will merge HSPs where appropriate so you should use  
that.

You can also use the -links option to get WU-BLAST to get the logical  
ordering and a consistent path through the HSPs.


On Dec 4, 2005, at 8:32 PM, Angshu Kar wrote:

> Hi,
>
> To begin with, I'm new to Bioperl.
> Now, I've written the following simple piece of code to parse a WU- 
> Blast
> output which filters data *for a given e-value and >50% overlap*.
>
> I'm writing the main algorithm here:
>
> my $blast_report = $ARG[1];
> my $threshold_evalue = $ARG[2];
>
> my $in = new Bio::SearchIO(-format => 'blast', -file =>  
> $blast_report);
>
> while (my $result = $in -> next_result)
>    {
>       while(my $hit = $result->next_hit)
>          {
>             if(($line{$hit->name} == $line{$result->query_accession}))
>                {
>                   next;
>                }
>             if($hit->hsp->evalue <= $threshold_evalue)
>                {
>                   if($hit->hsp->frac_indentical>=0.5)
>                      {
>                         print $line{$result->query_accession} . "\t" .
> $line{$hit->name} . "\t" . $hit->hsp-evalue . "\n";
>                     }
>               }
>       }
> }
>
> My questions are:
>
> 1. does the frac_identical gives the measure of % overlap? Or, are  
> there any
> other methods?
> 2. now, i don't have any blast data sets to test my code upon.could  
> any of
> the experienced users let me know whether the algorithm is fine?any
> tip-offs on any point (from optimization to syntactical errors) are  
> heartily
> welcome.
> 3. could any one please let me know if i can find sample wu-blast  
> outputs to
> test my script upon?

http://fungal.genome.duke.edu/~jes12/BGT203.2005/sample_reports/

Also checkout the biodata repository from bioperl and look in the  
DB_Searching directory, we had started a  project cataloging example  
reports in all the different formats.  This sort of fizzled out, but  
could still use some volunteers to better organize things and  
incorporate more examples.

http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biodata/ 
DB_Searching/


> Appreciate your guidance.
>
> Thanks,
> Angshu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




More information about the Bioperl-l mailing list