[Bioperl-l] Problem Parsing BLAST output to annotate FASTA file

Andreas Leimbach andreas.leimbach at uni-wuerzburg.de
Wed Feb 20 17:00:51 UTC 2013


Hey Ann,

damn, it 's not my best day ... Anyways, I wouldn't work with 
List::MoreUtils's each_array function, as this assumes that the blast 
hits and the nucleotide queries are in the same order (as Adam pointed 
out). Rather use a hash which associates a key to a certain value. Also, 
the hash can be used to skip sequences that have no hits.
Here's my new version:

my %hits;
my $hit_desc;
my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
"$ARGV[0]");
while (my $result = $search_in->next_result) {
while (my $hit = $result->next_hit) {
while (my $hsp = $hit->next_hsp) {
$hits{$result->query_description} = $hit->description; # hash: associate 
query_desc (key) with hit_desc (value)
last; # jump out of the while loop; this should resolve getting only the 
first hit
}
last; # see above
}
}


my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
while (my $seqobj = $seqio->next_seq) {
if ($hits{$seqobj->display_id}) { # only true if display_id associated 
with hit_desc and should skip seqs without hits
print ">$hits{$seqobj->display_id}\n";
my $nuc = $seqobj->seq();
print $nuc, "\n";
}
}

Cheers,
Andreas

P.S.: I redirected your mail to the BioPerl mailing list, others might 
profit from my mistakes ;-) ...

--
Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany

Tel.: +49 (0)551 39 3843
E-Mail: andreas.leimbach at uni-wuerzburg.de

On 20.2.13 17:35, Ann Gregory wrote:
> Hi Andreas,
>
> Thanks for you help! I don't understand how this gets the first blast hit:
>
> if ($hit->description eq $hit_desc) { # Only want the first blast hit
> next;
> }
>
> I tried this and seems to be working...but I can't get the 1st blast hit
> or skip the sequences that had no hits. Do you know any quick fixes?
>
> *
> use warnings;
> use strict;
> use Bio::SearchIO;
> use Bio::SeqIO;
> use List::MoreUtils qw(each_array);
>
> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
> "$ARGV[0]");
> my @ids;
> while (my $result = $search_in->next_result) {
> while (my $hit = $result->next_hit) {
> while (my $hsp = $hit->next_hsp) {
> my $match = $result->num_hits;
> push(@ids, $qd);
> }
> }
> }
> }
>
> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
> my @seqs;
> while (my $seqobj = $seqio->next_seq) {
> my $nuc = $seqobj->seq();
> push(@seqs, $nuc);
> }
>
> my $it = each_array(@ids, at seqs);
> while(my($ids,$seqs)=$it->()){
> print $ids, "\n", $seqs, "\n";
> }
> *
>
> Thanks again!
> ~Ann
>
> On Wed, Feb 20, 2013 at 9:24 AM, Andreas Leimbach
> <andreas.leimbach at uni-wuerzburg.de
> <mailto:andreas.leimbach at uni-wuerzburg.de>> wrote:
>
>     oops, I just realized I had one loop to much in there. Adam is
>     correct. Sorry.
>
>     The last part of the code I send you should look like this:
>
>
>     my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
>     while (my $seqobj = $seqio->next_seq) {
>     print ">$hits{$seqobj->display_id}\__n";
>
>     my $nuc = $seqobj->seq();
>     print $nuc, "\n";
>     }
>
>
>     Cheers,
>     Andreas
>
>
>     --
>     Andreas Leimbach
>     Universität Münster
>     Institut für Hygiene
>     Mendelstr. 7
>     D-48149 Münster
>     Germany
>
>     Tel.: +49 (0)551 39 3843 <tel:%2B49%20%280%29551%2039%203843>
>     E-Mail: andreas.leimbach at uni-__wuerzburg.de
>     <mailto:andreas.leimbach at uni-wuerzburg.de>
>
>     On 20.2.13 06:20, Ann Gregory wrote:
>
>         Hi BioPerl,
>
>         I am having issues with a BioPerl script. I have a blastxml file
>         from a
>         blastx blast and the original multifasta file containing the
>         original
>         nucleotides sequences.
>
>         I want to take the blast result (ie. the blast description) and
>         annotate my
>         multifasta file.
>
>         I have written 2 while loops that extract the blast descriptions
>         as well as
>         the nucleotide sequence from the multifasta file.
>
>         My problem is that I cannot incorporate one of the while loops
>         into the
>         other without loosing the loop property of one of the loops. I
>         would like
>         to take the 1st blast description, then the 1st nucleotide
>         sequence, then
>         the 2nd blast description, then the 2nd nucleotide sequence and so
>         on...just can figure out how to alternate the results.
>
>         See script below:
>
>
>         use warnings;
>         use strict;
>         use Bio::SearchIO;
>         use Bio::SeqIO;
>
>
>         my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
>         "$ARGV[0]");
>         while (my $result = $search_in->next_result) {
>         while (my $hit = $result->next_hit) {
>         while (my $hsp = $hit->next_hsp) {
>         my $qd = $hit->description;
>         print $qd, "\n";
>         }
>         }
>         }
>
>         my $seqio = Bio::SeqIO->new(-format => 'fasta', -file =>
>         "$ARGV[1]");
>         while (my $seqobj = $seqio->next_seq) {
>         my $nuc = $seqobj->seq();
>         print $nuc, "\n";
>         }--
>         Ann (Nina) Gregory
>         Graduate Student
>         Rich Lab / Sullivan Lab
>         Soil, Water, Environmental Science Department
>         University of Arizona
>         _________________________________________________
>         Bioperl-l mailing list
>         Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>         http://lists.open-bio.org/__mailman/listinfo/bioperl-l
>         <http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>
>
>
>
> --
> Ann (Nina) Gregory
> Graduate Student
> Rich Lab / Sullivan Lab
> Soil, Water, Environmental Science Department
> University of Arizona
>
>
>



More information about the Bioperl-l mailing list