[Bioperl-l] How to parse BLAST output - all hits of each queryinnew file

Tim Koehler timbourine81 at googlemail.com
Mon Nov 30 17:23:58 UTC 2009


Hello everybody,

thanks a lot for the overwhelming answers! All these codes are different
flavors and worked all.

For me the added code works the best. But I think I found a bug in
...Bio/SearchIO/blast.pm.
There the DEFAULT_BLAST_... variable is set to
Bio::Search::Writer::HitTableWriter instead of
Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to
HTMLResultWriter
and others.

So again: THANKS for the support!

Cheers,
Tim

#!/usr/bin/perl -w

use strict;

use Bio::Tools::Run::StandAloneBlast;

use Bio::SeqIO;

use Bio::SearchIO;

### add here the writer you want
use Bio::SearchIO::Writer::HitTableWriter;

use Bio::Search::Result::BlastResult;



use Data::Dumper;



my $Seq_in = Bio::SeqIO->new( -file   =>
"/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta",

                              -format => "fasta" );



while ( my $query = $Seq_in->next_seq() ) {

       warn "Processing ",$query->id, "\n";

  my $factory =

    Bio::Tools::Run::StandAloneBlast->new(

                 program  => "blastn",

                 database =>
"/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db",

                 _READMETHOD => "Blast"

    );



  my $blast_report = $factory->blastall($query);

  sleep 5;



  # just write the result we got for this query into a

   #new blast-formatted file...named after the id of the query seq...

  my $result = $blast_report->next_result;

  my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format =>
"blast" ) or die $!;

  $blio->write_result($result);



  # below, just looking at the current blast result

###this does not appear in the output files

  while ( my $result = $blast_report->next_result ) {

    ## $result is a Bio::Search::Result::ResultI compliant object

    while ( my $hit = $result->next_hit ) {

      ## $hit is a Bio::Search::Hit::HitI compliant object

      while ( my $hsp = $hit->next_hsp ) {

        ## $hsp is a Bio::Search::HSP::HSPI compliant object

        if ( $hsp->length('total') > 50 ) {

          if ( $hsp->percent_identity >= 75 ) {

            print "Query= ", $result->query_name,

              "Hit= ",        $hit->name,

              "Length= ",     $hsp->length('total'),

              "Percent_id= ", $hsp->percent_identity,

              "Subject=",     $hsp->hit_string, "\n";

          }

        }

      }

    }

  }

}



On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

>  Changed it to a generic result and added a writer and it seems tio work:
>
>
>
>   foreach my $qid ( keys %hits_by_query ) {
>
>     warn "qid = $qid\n";
>
>     my $res = Bio::Search::Result::GenericResult->new(-algorithm =>
> "blastn") or die $!;
>
>    # print Dumper $res;
>
>     foreach my $h ( @{ $hits_by_query{$qid} } ){
>
>                      warn "adding hit ", $h->name, "\n";
>
>                      $res->add_hit($h) if defined($h);
>
>                            }
>
>     my $writerhtml =  Bio::SearchIO::Writer::HTMLResultWriter->new();
>
>     my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file =>
> ">$qid\.bls\.html", -format => "blast" ) or die $!;
>
>     $blio->write_result($res);
>
>   }
>
>
>
>
>
> *From:* Mark A. Jensen [mailto:maj at fortinbras.us]
> *Sent:* Monday, 30 November 2009 10:19 a.m.
> *To:* Smithies, Russell; 'Tim Koehler'
>
> *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each
> queryinnew file
>
>
>
> My thought here was that since Tim's already going one at a time thru
>
> his queries, my scrap was not really necessary:
>
>
>
> use strict;
>
> use Bio::Tools::Run::StandAloneBlast;
>
> use Bio::SeqIO;
>
> use Bio::SearchIO;
>
> use Bio::Search::Result::BlastResult;
>
>
>
> use Data::Dumper;
>
>
>
> my $Seq_in = Bio::SeqIO->new( -file   => "sequences.fasta",
>
>                               -format => "fasta" );
>
>
>
> while ( my $query = $Seq_in->next_seq() ) {
>
>        warn "Processing ",$query->id, "\n";
>
>   my $factory =
>
>     Bio::Tools::Run::StandAloneBlast->new(
>
>                  program  => "blastn",
>
>                  database =>
> "/data/databases/flatfile/illuminati_blastdata/nt",
>
>                  _READMETHOD => "Blast"
>
>     );
>
>
>
>   my $blast_report = $factory->blastall($query);
>
>   sleep 5;
>
>
>
>   # just write the result we got for this query into a
>
>    #new blast-formatted file...named after the id of the query seq...
>
>  my $result = $blast_report->next_result;
>
> my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format =>
> "blast" ) or die $!;
>
>   $blio->write_result($result);
>
>
>
>   # below, just looking at the current blast result
>
>   while ( my $result = $blast_report->next_result ) {
>
>     ## $result is a Bio::Search::Result::ResultI compliant object
>
>     while ( my $hit = $result->next_hit ) {
>
>       ## $hit is a Bio::Search::Hit::HitI compliant object
>
>       while ( my $hsp = $hit->next_hsp ) {
>
>         ## $hsp is a Bio::Search::HSP::HSPI compliant object
>
>         if ( $hsp->length('total') > 50 ) {
>
>           if ( $hsp->percent_identity >= 75 ) {
>
>             print "Query= ", $result->query_name,
>
>               "Hit= ",        $hit->name,
>
>               "Length= ",     $hsp->length('total'),
>
>               "Percent_id= ", $hsp->percent_identity,
>
>               "Subject=",     $hsp->hit_string, "\n";
>
>           }
>
>         }
>
>       }
>
>     }
>
>   }
>
> }
>
>  ----- Original Message -----
>
> *From:* Smithies, Russell <Russell.Smithies at agresearch.co.nz>
>
> *To:* 'Tim Koehler' <timbourine81 at googlemail.com> ; 'maj at fortinbras.us'<%27maj at fortinbras.us%27>
>
> *Sent:* Sunday, November 29, 2009 3:58 PM
>
> *Subject:* RE: [Bioperl-l] How to parse BLAST output - all hits of each
> queryinnew file
>
>
>
> Hi Tim
>
> With various people writing the “howtos” and other docs, the examples are
> bound to have differing names for the variables used but as long as you’re
> consistent, it should all fit together.
>
>
>
> I think I’ve almost got your code working, just getting errors from
> Bio::Search::Result::BlastResult  which I’m not entirely sure how to use.
> Perhaps Mark can get this bit going?
>
>
>
> --Russell
>
> ===============================
>
>
>
> use strict;
>
> use Bio::Tools::Run::StandAloneBlast;
>
> use Bio::SeqIO;
>
> use Bio::SearchIO;
>
> use Bio::Search::Result::BlastResult;
>
>
>
> use Data::Dumper;
>
>
>
> my $Seq_in = Bio::SeqIO->new( -file   => "sequences.fasta",
>
>                               -format => "fasta" );
>
>
>
> while ( my $query = $Seq_in->next_seq() ) {
>
>        warn "Processing ",$query->id, "\n";
>
>   my $factory =
>
>     Bio::Tools::Run::StandAloneBlast->new(
>
>                  program  => "blastn",
>
>                  database =>
> "/data/databases/flatfile/illuminati_blastdata/nt",
>
>                  _READMETHOD => "Blast"
>
>     );
>
>
>
>   my $blast_report = $factory->blastall($query);
>
>   sleep 5;
>
>
>
>
>
>   my %hits_by_query;
>
>
>
>        while ( my $result = $blast_report->next_result ) {
>
>          foreach my $hit ( $result->hits ) {
>
>                      warn "Pushed a hit for ",$hit->name, "\n";
>
>            push( @{ $hits_by_query{ $hit->name } }, $hit );
>
>          }
>
>        }
>
>
>
>   foreach my $qid ( keys %hits_by_query ) {
>
>               warn "qid = $qid\n";
>
>     my $res = Bio::Search::Result::BlastResult->new() or die $!;
>
>     print Dumper $res;
>
>     foreach my $h ( @{ $hits_by_query{$qid} } ){
>
>                      warn "adding hit ", $h->name, "\n";
>
>                      $res->add_hit($h) if defined($h);
>
>                            }
>
>     my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format =>
> "blast" ) or die $!;
>
>     $blio->write_result($res);
>
>   }
>
>
>
>   while ( my $result = $blast_report->next_result ) {
>
>     ## $result is a Bio::Search::Result::ResultI compliant object
>
>     while ( my $hit = $result->next_hit ) {
>
>       ## $hit is a Bio::Search::Hit::HitI compliant object
>
>       while ( my $hsp = $hit->next_hsp ) {
>
>         ## $hsp is a Bio::Search::HSP::HSPI compliant object
>
>         if ( $hsp->length('total') > 50 ) {
>
>           if ( $hsp->percent_identity >= 75 ) {
>
>             print "Query= ", $result->query_name,
>
>               "Hit= ",        $hit->name,
>
>               "Length= ",     $hsp->length('total'),
>
>               "Percent_id= ", $hsp->percent_identity,
>
>               "Subject=",     $hsp->hit_string, "\n";
>
>           }
>
>         }
>
>       }
>
>     }
>
>   }
>
> }
>
> ===============================
>
>
>
> *From:* Tim Koehler [mailto:timbourine81 at googlemail.com]
> *Sent:* Friday, 27 November 2009 10:24 p.m.
> *To:* Smithies, Russell; maj at fortinbras.us
> *Subject:* Re: [Bioperl-l] How to parse BLAST output - all hits of each
> queryinnew file
>
>
>
> Hey guys,
>
> please, do not get me wrong that I wanted to put the workload on you. So
> far I only found the HowTo's but in there in some way the language changed
> with time (e.g. $in to $Seq_in) or some things I simply could not find.
> Now I got a tip where else to search: the scrapbook and deobfuscator.
>
> I immediately will have a look at that.
>
> This is the first time for me touching linux / perl commands; that's why I
> thought after several days of trial and many errors ;) asking the
> mailinglist.
>
> I was very happy about your fast answers!
>
> Cheers and a nice weekend,
>
> Tim
>
> On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler <timbourine81 at googlemail.com>
> wrote:
>
> ups, sent too early...
>
> Hey Mark,
>
> thanks for the answer. But I am still struggling, especially where to put
> in your code.
>
> Here ist the code I have, so far:
>
> #!/usr/bin/perl -w
>
> ### should I put your code here as push is a perl command?
>
>
> my %hits_by_query;
> for ($result->hits) {
>
> ### I inserted a comma after name}}; if there is no comma, there was the
> error: Scalar found where operator expected at
> 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit"
> ###        (Missing operator before  $hit?)
> ###Useless use of push with no values at
> 12_BLAST_two_sequence_each_query_one_file.PL line 7.
> ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7,
> near "} $hit"
> ###BEGIN not safe after errors--compilation aborted at
> 12_BLAST_two_sequence_each_query_one_file.PL line 13.
>
>
>  push @{$hits_by_query{$hit->name}}, $hit;
>
> ###here, every time this terror appears: Name "main::result" used only
> once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5.
> ###error: Can't call method "hits" on an undefined value at
> 12_BLAST_two_sequence_each_query_one_file.PL line 5.
>
>
> }
>
>
> use strict;
> use Bio::Tools::Run::StandAloneBlast;
> use Bio::SeqIO;
> use Bio::SearchIO;
>
> use Bio::Search::Result::BlastResult;
>
> my $Seq_in = Bio::SeqIO->new (
> -file =>
> "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta",
> -format => 'fasta'
> );
> while (my $query = $Seq_in->next_seq()) {
>
>
> my $factory = Bio::Tools::Run::StandAloneBlast->new(
>
> 'program' => 'blastn',
> 'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db',
> _READMETHOD => "Blast"
> );
>
> my $blast_report = $factory->blastall($query);
>
> ### Should I need to use a module? are the commands here at the right
> position? errors, e.g., Global symbol "$hit" requires explicit package name
> #my %hits_by_query;
> #for ($result->hits) {
> ### inserted comma after name}}
> # push @{$hits_by_query{$hit->name}}, $hit;
> #}
>
>
>
> foreach my $qid ( keys %hits_by_query ) {
>  my $result = Bio::Search::Result::BlastResult->new();
>  $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
>  my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
>  $blio->write_result($result);
> }
>
> ###where are the files stored? what is their name. Sorry, but I cannot get
> behind that :(
>
> while( my $result = $blast_report->next_result ) {
>   ## $result is a Bio::Search::Result::ResultI compliant object
>
>
>   while( my $hit = $result->next_hit ) {
>
>    ## $hit is a Bio::Search::Hit::HitI compliant object
>
>
>    while( my $hsp = $hit->next_hsp ) {
>
>     ## $hsp is a Bio::Search::HSP::HSPI compliant object
>     if( $hsp->length('total') > 50 ) {
>      if ( $hsp->percent_identity >= 75 ) {
>      print  "Query= ",        $result->query_name,
>         "Hit= ",        $hit->name,
>             "Length= ",     $hsp->length('total'),
>             "Percent_id= ", $hsp->percent_identity,
>         "Subject=",        $hsp->hit_string,"\n";
>      }
>     }
>    }
>   }
> }
> }
>
> Again, a big thanks in advance :)
>
> All the best,
>
> Tim
>
> On Thu, Nov 26, 2009 at 4:52 PM, Tim <timbourine81 at gmail.com> wrote:
>
> Hey Mark,
>
> thanks for the answer
>
>
>
>
> On 25.11.2009 20:21, Mark A. Jensen wrote:
> > whoops: change the following line:
> > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' );
> >
> > to
> >
> > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
> >
> > (I always forget that...)
> > MAJ
> >
> > ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> > To: "Tim" <timbourine81 at gmail.com>; <bioperl-l at lists.open-bio.org>
> > Sent: Wednesday, November 25, 2009 1:20 PM
> > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each
> > queryinnew file
> >
> >
> >> hey Tim--
> >>
> >> Sound like you need to go about collecting your queries inside out:
> >>
> >> my %hits_by_query;
> >> for ($result->hits) {
> >>  push @{$hits_by_query{$hit->name}} $hit;
> >> }
> >>
> >> I believe now each hash element, keyed by the query name, will contain
> >> an arrayref to the set of hits assoc with that query.
> >>> From here, I believe
> >>
> >> use Bio::Search::Result::BlastResult;
> >> use Bio::SearchIO;
> >>
> >> foreach my $qid ( keys %hits_by_query ) {
> >>  my $result = Bio::Search::Result::BlastResult->new();
> >>  $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
> >>  my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast'
> );
> >>  $blio->write_result($result);
> >> }
> >>
> >> will do what you want.
> >>
> >> hope this helps -
> >> Mark
> >>
> >> ----- Original Message ----- From: "Tim" <timbourine81 at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Wednesday, November 25, 2009 12:40 PM
> >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each
> >> query innew file
> >>
> >>
> >>> Dear bioperl users,
> >>>
> >>> I am a real newbie and have - maybe a very trivial - question.
> >>>
> >>> I searched the mailing list archive and many howtos but I have not
> found
> >>> a concrete answer to my problem. So hopefully you can help me :)
> >>>
> >>> Background: I use the latest Bioperl version (installed it two weeks
> >>> before).
> >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file
> >>> including different sequences, I get a BLAST output with many queries
> >>> each having several hits / sbjcts.
> >>>
> >>> My problem is how to parse *all* hits of *one* query into a single new
> >>> file. And this for all the queries I have in my BLAST output file.
> >>>
> >>> Or is it better the other way round; first to make fasta files with
> only
> >>> single sequences inside and BLAST each file? But how can I automize
> that
> >>> using Bioperl?
> >>>
> >>> I tried Bio::SearchIO but can only parse all queries and their
> >>> respective hits in only one file...
> >>> I think iteration is also necessary here, but I do not really know how
> >>> to include that into Bio::SearchIO.
> >>> Or do I have to use Module:Bio::Index::Blast?
> >>>
> >>> I can index a file (see below), but I have no idea what comes next...
> >>>
> >>> ###How I index a file...
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File";
> >>>
> >>> use Bio::Index::Fasta;
> >>>
> >>>
> >>> $file_name = "8_to_BLAST_two_seq_index.fasta";
> >>> $id = "48882";
> >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx",
> >>> -write_flag => 1);
> >>> $inx->make_index($file_name);
> >>>
> >>>
> >>> Hopefully, you can give me at least hints what to look for.
> >>>
> >>> A big THANKS in advance!
> >>>
> >>> Cheers,
> >>>
> >>> Tim
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
>
> Tim Köhler
> MPI for Terrestrial Microbiology
> Karl-von-Frisch-Straße
> D-35043 Marburg / Germany
>
> Email: koehlerd at mpi-marburg.mpg.de
> Phone: +49 6421 178-740
> Fax:   +49 6421 178-999
>
>
>
>
>  ------------------------------
>
> *Attention: *The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
>  ------------------------------
>
>
>
>




More information about the Bioperl-l mailing list