[Bioperl-l] parsing a BLAST output

Angshu Kar angshu96 at gmail.com
Thu Dec 8 18:59:12 EST 2005


Thanks for the url. I'll go through it and let you know if I face any
problem.
And have you had any code pieces using those functions to calculate %
overlap? It will be great if you can provide them to me.

Thank you so much,
Angshu


On 12/4/05, Barry Moore <bmoore at genetics.utah.edu> wrote:
>
> Angshu-
>
> 1) No.  From the docs (online at
> http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/HSP/BlastHSP.html
> ):
>
> Different versions of Blast report different values for the total length
> of the alignment. This is the number reported in the denominators in the
> stats section: "Identical = 34/120 Positives = 67/120". NCBI-BLAST uses
> the total length of the alignment (with gaps) WU-BLAST uses the length
> of the query sequence (without gaps). Therefore, when called without an
> argument or an argument of 'total', this method will report different
> values depending on the version of BLAST used.
>
> To get the fraction identical among only the aligned residues, ignoring
> the gaps, call this method with an argument of 'query' or 'sbjct'
> ('sbjct' is synonymous with 'hit').
>
> 2) If I understand your question correctly I think you are looking for
> frac_aligned_hit and/or frac_aligned_query called on you hit object.
> See
> (http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/Hit/GenericHit.h
> tml) for discussion.
>
> 3) Try the files in the bioperl test/data directory for lots of program
> output samples.  For wu-blast have a look at:
>
> bioperl-live/t/data/brassica_ATH.WUBLASTN
>
> which can be found on the web at:
>
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/bioperl-li
> ve/t/data/brassica_ATH.WUBLASTN?rev=HEAD&cvsroot=bioperl&content-type=te
> xt/plain.
>
> Barry
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Angshu Kar
> Sent: Sunday, December 04, 2005 6:32 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] parsing a BLAST output
>
> Hi,
>
> To begin with, I'm new to Bioperl.
> Now, I've written the following simple piece of code to parse a WU-Blast
> output which filters data *for a given e-value and >50% overlap*.
>
> I'm writing the main algorithm here:
>
> my $blast_report = $ARG[1];
> my $threshold_evalue = $ARG[2];
>
> my $in = new Bio::SearchIO(-format => 'blast', -file => $blast_report);
>
> while (my $result = $in -> next_result)
>   {
>      while(my $hit = $result->next_hit)
>         {
>            if(($line{$hit->name} == $line{$result->query_accession}))
>               {
>                  next;
>               }
>            if($hit->hsp->evalue <= $threshold_evalue)
>               {
>                  if($hit->hsp->frac_indentical>=0.5)
>                     {
>                        print $line{$result->query_accession} . "\t" .
> $line{$hit->name} . "\t" . $hit->hsp-evalue . "\n";
>                    }
>              }
>      }
> }
>
> My questions are:
>
> 1. does the frac_identical gives the measure of % overlap? Or, are there
> any
> other methods?
> 2. now, i don't have any blast data sets to test my code upon.could any
> of
> the experienced users let me know whether the algorithm is fine?any
> tip-offs on any point (from optimization to syntactical errors) are
> heartily
> welcome.
> 3. could any one please let me know if i can find sample wu-blast
> outputs to
> test my script upon?
>
> Appreciate your guidance.
>
> Thanks,
> Angshu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list