[Bioperl-l] parsing a BLAST output

Barry Moore bmoore at genetics.utah.edu
Thu Dec 8 22:58:07 EST 2005


Angshu-

 

I have not used those functions in any of my code, you might grep
through some of the test scripts to look for examples.

 

Barry

 

-----Original Message-----
From: Angshu Kar [mailto:angshu96 at gmail.com] 
Sent: Thursday, December 08, 2005 4:59 PM
To: Barry Moore
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] parsing a BLAST output

 

Thanks for the url. I'll go through it and let you know if I face any
problem.

And have you had any code pieces using those functions to calculate %
overlap? It will be great if you can provide them to me.

 

Thank you so much,

Angshu

 

On 12/4/05, Barry Moore <bmoore at genetics.utah.edu> wrote: 

Angshu-

1) No.  From the docs (online at
http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/HSP/BlastHSP.html
):

Different versions of Blast report different values for the total length
of the alignment. This is the number reported in the denominators in the

stats section: "Identical = 34/120 Positives = 67/120". NCBI-BLAST uses
the total length of the alignment (with gaps) WU-BLAST uses the length
of the query sequence (without gaps). Therefore, when called without an 
argument or an argument of 'total', this method will report different
values depending on the version of BLAST used.

To get the fraction identical among only the aligned residues, ignoring
the gaps, call this method with an argument of 'query' or 'sbjct' 
('sbjct' is synonymous with 'hit').

2) If I understand your question correctly I think you are looking for
frac_aligned_hit and/or frac_aligned_query called on you hit object.
See
(
http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/Hit/GenericHit.h
<http://doc.bioperl.org/releases/bioperl-1.4/Bio/Search/Hit/GenericHit.h
> 
tml) for discussion.

3) Try the files in the bioperl test/data directory for lots of program
output samples.  For wu-blast have a look at: 

bioperl-live/t/data/brassica_ATH.WUBLASTN

which can be found on the web at:

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/bioperl-li

ve/t/data/brassica_ATH.WUBLASTN?rev=HEAD&cvsroot=bioperl&content-type=te
xt/plain.

Barry

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org 
[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Angshu Kar
Sent: Sunday, December 04, 2005 6:32 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] parsing a BLAST output

Hi,

To begin with, I'm new to Bioperl.
Now, I've written the following simple piece of code to parse a WU-Blast
output which filters data *for a given e-value and >50% overlap*. 

I'm writing the main algorithm here:

my $blast_report = $ARG[1];
my $threshold_evalue = $ARG[2];

my $in = new Bio::SearchIO(-format => 'blast', -file => $blast_report);

while (my $result = $in -> next_result) 
  {
     while(my $hit = $result->next_hit)
        {
           if(($line{$hit->name} == $line{$result->query_accession}))
              {
                 next;
              }
           if($hit->hsp->evalue <= $threshold_evalue) 
              {
                 if($hit->hsp->frac_indentical>=0.5)
                    {
                       print $line{$result->query_accession} . "\t" .
$line{$hit->name} . "\t" . $hit->hsp-evalue . "\n"; 
                   }
             }
     }
}

My questions are:

1. does the frac_identical gives the measure of % overlap? Or, are there
any
other methods?
2. now, i don't have any blast data sets to test my code upon.could any
of
the experienced users let me know whether the algorithm is fine?any
tip-offs on any point (from optimization to syntactical errors) are
heartily
welcome.
3. could any one please let me know if i can find sample wu-blast 
outputs to
test my script upon?

Appreciate your guidance.

Thanks,
Angshu

_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

 




More information about the Bioperl-l mailing list