[Bioperl-l] parsing a BLAST output

Angshu Kar angshu96 at gmail.com
Thu Dec 8 18:55:00 EST 2005


Thanks Jason...So, if I get you right, if I use the  -postsw option I can
use any of the

$fracqaligned = $hsp->query->length / $result->query_length;
$frachaligned = $hsp->hit->length / $hit->length;

formulae and merge the result for all HSPs to get the % overlap.?

I mean then do I have to worry about repeated domain or multiple high
scoring suboptimal alignments ?

Appreciate your guidance.

Thanks,
Angshu



On 12/4/05, Jason Stajich <jason.stajich at duke.edu> wrote:
>
> frac_identical gives the fraction of bases in the HSP that are
> identical.
>
> overlap would be calculated from (length of the query aligned) /
> (length of query) or (length of hit aligned) / (length of hit).
>
> So for an HSP you can calculate this
> $fracqaligned = $hsp->query->length / $result->query_length;
> $frachaligned = $hsp->hit->length / $hit->length;
>
> But remember there may be multiple HSPs so you may need to merge the
> HSP information to get the total length aligned.  If there is a
> repeated domain or multiple high scoring suboptimal alignments this
> can cause things to get a little tricky.
>
> There are two methods provided in the HitI interface called
> frac_aligned_query() and frac_aligned_hit() do try and take all of
> this into account for you, but I admit this is less well tested
> code.  But do give them a try:
> $fracqaligned = $hit->frac_aligned_query();
> $frachaligned = $hit->frac_aligned_hit();
>
>
> If you are using WU-BLASTP add the -postsw option to get a refined
> alignment which will merge HSPs where appropriate so you should use
> that.
>
> You can also use the -links option to get WU-BLAST to get the logical
> ordering and a consistent path through the HSPs.
>
>
> On Dec 4, 2005, at 8:32 PM, Angshu Kar wrote:
>
> > Hi,
> >
> > To begin with, I'm new to Bioperl.
> > Now, I've written the following simple piece of code to parse a WU-
> > Blast
> > output which filters data *for a given e-value and >50% overlap*.
> >
> > I'm writing the main algorithm here:
> >
> > my $blast_report = $ARG[1];
> > my $threshold_evalue = $ARG[2];
> >
> > my $in = new Bio::SearchIO(-format => 'blast', -file =>
> > $blast_report);
> >
> > while (my $result = $in -> next_result)
> >    {
> >       while(my $hit = $result->next_hit)
> >          {
> >             if(($line{$hit->name} == $line{$result->query_accession}))
> >                {
> >                   next;
> >                }
> >             if($hit->hsp->evalue <= $threshold_evalue)
> >                {
> >                   if($hit->hsp->frac_indentical>=0.5)
> >                      {
> >                         print $line{$result->query_accession} . "\t" .
> > $line{$hit->name} . "\t" . $hit->hsp-evalue . "\n";
> >                     }
> >               }
> >       }
> > }
> >
> > My questions are:
> >
> > 1. does the frac_identical gives the measure of % overlap? Or, are
> > there any
> > other methods?
> > 2. now, i don't have any blast data sets to test my code upon.could
> > any of
> > the experienced users let me know whether the algorithm is fine?any
> > tip-offs on any point (from optimization to syntactical errors) are
> > heartily
> > welcome.
> > 3. could any one please let me know if i can find sample wu-blast
> > outputs to
> > test my script upon?
>
> http://fungal.genome.duke.edu/~jes12/BGT203.2005/sample_reports/
>
> Also checkout the biodata repository from bioperl and look in the
> DB_Searching directory, we had started a  project cataloging example
> reports in all the different formats.  This sort of fizzled out, but
> could still use some volunteers to better organize things and
> incorporate more examples.
>
> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biodata/
> DB_Searching/
>
>
> > Appreciate your guidance.
> >
> > Thanks,
> > Angshu
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>



More information about the Bioperl-l mailing list