[Bioperl-l] Problem Parsing BLAST output to annotate FASTA file

Andreas Leimbach andreas.leimbach at uni-wuerzburg.de
Wed Feb 20 16:14:29 UTC 2013


Hi Ann,

I agree with Adam, but I was already writing my email, while his came 
in. Hope it helps:

I hope I understand correctly what you want to do.
Just to clarify, you queried a protein blast database with blastx and 
nucleotide queries. Now you want to associate the protein description 
for the FIRST blast hit with the corresponding nucleotide fasta file. Is 
that correct?
You have to put the two while loops into one another. Or associate the 
blast hits with the query descriptions. But it's not feasible to take 
the first blast hit and the first nucleotide fasta seq, then the 2nd of 
both etc, as Adam already pointed out.
You would have to iterate through both at the same time. I.e. take the 
first blast hit, then iterate through the nucleotide fasta until you 
find the hit. Then take the 2nd blast hit and iterate through the 
nucleotide fasta etc. It's probably easiest to do this in a hash.

Something along the lines of (not tested I just punched that in the E-Mail):

my %hits;
my $hit_desc;
my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
"$ARGV[0]");
while (my $result = $search_in->next_result) {
while (my $hit = $result->next_hit) {
while (my $hsp = $hit->next_hsp) {
if ($hit->description eq $hit_desc) { # Only want the first blast hit
next;
}
my $hit_desc = $hit->description;
$hits{$result->query_description} = $hit_desc;
}
}
}

my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
foreach my $query (keys %hits) {
while (my $seqobj = $seqio->next_seq) {
if ($seqobj->display_id eq $query) {
print ">$hits{$query}\n";
my $nuc = $seqobj->seq();
print $nuc, "\n";
}

You might want to put some evalue cutoff in there to only score 
significant hits. Also if your nucleotide query multi-fasta file is very 
large, you might consider creating an index first:
http://www.bioperl.org/wiki/HOWTO:Local_Databases#Bio::Index

Hope that helps!

Cheers,
Andreas

P.S.: Please next time include version numbers for BioPerl and Perl and 
a little more detail what you want to do. ;-)


--
Andreas Leimbach
Universität Münster
Institut für Hygiene
Mendelstr. 7
D-48149 Münster
Germany

Tel.: +49 (0)551 39 3843
E-Mail: andreas.leimbach at uni-wuerzburg.de

On 20.2.13 06:20, Ann Gregory wrote:
> Hi BioPerl,
>
> I am having issues with a BioPerl script. I have a blastxml file from a
> blastx blast and the original multifasta file containing the original
> nucleotides sequences.
>
> I want to take the blast result (ie. the blast description) and annotate my
> multifasta file.
>
> I have written 2 while loops that extract the blast descriptions as well as
> the nucleotide sequence from the multifasta file.
>
> My problem is that I cannot incorporate one of the while loops into the
> other without loosing the loop property of one of the loops. I would like
> to take the 1st blast description, then the 1st nucleotide sequence, then
> the 2nd blast description, then the 2nd nucleotide sequence and so
> on...just can figure out how to alternate the results.
>
> See script below:
>
>
> use warnings;
> use strict;
> use Bio::SearchIO;
> use Bio::SeqIO;
>
>
> my $search_in = Bio::SearchIO->new(-format => 'blastxml', -file =>
> "$ARGV[0]");
> while (my $result = $search_in->next_result) {
> while (my $hit = $result->next_hit) {
> while (my $hsp = $hit->next_hsp) {
> my $qd = $hit->description;
> print $qd, "\n";
> }
> }
> }
>
> my $seqio = Bio::SeqIO->new(-format => 'fasta', -file => "$ARGV[1]");
> while (my $seqobj = $seqio->next_seq) {
> my $nuc = $seqobj->seq();
> print $nuc, "\n";
> }--
> Ann (Nina) Gregory
> Graduate Student
> Rich Lab / Sullivan Lab
> Soil, Water, Environmental Science Department
> University of Arizona
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list