[Bioperl-l] How to parse BLAST output - all hits of each queryinnew file

Mark A. Jensen maj at fortinbras.us
Mon Nov 30 17:41:44 UTC 2009


thanks Tim! corrected (I hope) in r16432... 
MAJ
  ----- Original Message ----- 
  From: Tim Koehler 
  To: Smithies, Russell 
  Cc: Mark A. Jensen ; bioperl-l at lists.open-bio.org 
  Sent: Monday, November 30, 2009 12:23 PM
  Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file


  Hello everybody,

  thanks a lot for the overwhelming answers! All these codes are different flavors and worked all.

  For me the added code works the best. But I think I found a bug in ...Bio/SearchIO/blast.pm. 
  There the DEFAULT_BLAST_... variable is set to Bio::Search::Writer::HitTableWriter instead of Bio::SearchIO::Writer::HitTableWriter. This variable I changed also to HTMLResultWriter and others.

  So again: THANKS for the support!

  Cheers, 
  Tim

  #!/usr/bin/perl -w

  use strict;

  use Bio::Tools::Run::StandAloneBlast;

  use Bio::SeqIO;

  use Bio::SearchIO;

  ### add here the writer you want
  use Bio::SearchIO::Writer::HitTableWriter;

  use Bio::Search::Result::BlastResult;

   

  use Data::Dumper;

   

  my $Seq_in = Bio::SeqIO->new( -file   => "/home/koehler/Programs/for_BLAST/1_to_BLAST_two_seq.fasta",

                                -format => "fasta" );

   

  while ( my $query = $Seq_in->next_seq() ) {

         warn "Processing ",$query->id, "\n";

    my $factory =

      Bio::Tools::Run::StandAloneBlast->new(

                   program  => "blastn",

                   database => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db",

                   _READMETHOD => "Blast"

      );

   

    my $blast_report = $factory->blastall($query);

    sleep 5;

   

    # just write the result we got for this query into a 

     #new blast-formatted file...named after the id of the query seq...  

    my $result = $blast_report->next_result;

    my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!;

    $blio->write_result($result);

   

    # below, just looking at the current blast result

  ###this does not appear in the output files

    while ( my $result = $blast_report->next_result ) {

      ## $result is a Bio::Search::Result::ResultI compliant object

      while ( my $hit = $result->next_hit ) {

        ## $hit is a Bio::Search::Hit::HitI compliant object

        while ( my $hsp = $hit->next_hsp ) {

          ## $hsp is a Bio::Search::HSP::HSPI compliant object

          if ( $hsp->length('total') > 50 ) {

            if ( $hsp->percent_identity >= 75 ) {

              print "Query= ", $result->query_name,

                "Hit= ",        $hit->name,

                "Length= ",     $hsp->length('total'),

                "Percent_id= ", $hsp->percent_identity,

                "Subject=",     $hsp->hit_string, "\n";

            }

          }

        }

      }

    }

  }

   


  On Sun, Nov 29, 2009 at 11:29 PM, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:

    Changed it to a generic result and added a writer and it seems tio work:



      foreach my $qid ( keys %hits_by_query ) {

        warn "qid = $qid\n";

        my $res = Bio::Search::Result::GenericResult->new(-algorithm => "blastn") or die $!;

       # print Dumper $res;

        foreach my $h ( @{ $hits_by_query{$qid} } ){

                         warn "adding hit ", $h->name, "\n";

                         $res->add_hit($h) if defined($h);

                               }

        my $writerhtml =  Bio::SearchIO::Writer::HTMLResultWriter->new();

        my $blio = Bio::SearchIO->new(-writer => $writerhtml, -file => ">$qid\.bls\.html", -format => "blast" ) or die $!;

        $blio->write_result($res);

      }





    From: Mark A. Jensen [mailto:maj at fortinbras.us] 
    Sent: Monday, 30 November 2009 10:19 a.m.
    To: Smithies, Russell; 'Tim Koehler'


    Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file



    My thought here was that since Tim's already going one at a time thru

    his queries, my scrap was not really necessary: 



    use strict;

    use Bio::Tools::Run::StandAloneBlast;

    use Bio::SeqIO;

    use Bio::SearchIO;

    use Bio::Search::Result::BlastResult;



    use Data::Dumper;



    my $Seq_in = Bio::SeqIO->new( -file   => "sequences.fasta",

                                  -format => "fasta" );



    while ( my $query = $Seq_in->next_seq() ) {

           warn "Processing ",$query->id, "\n";

      my $factory =

        Bio::Tools::Run::StandAloneBlast->new(

                     program  => "blastn",

                     database => "/data/databases/flatfile/illuminati_blastdata/nt",

                     _READMETHOD => "Blast"

        );



      my $blast_report = $factory->blastall($query);

      sleep 5;



      # just write the result we got for this query into a 

       #new blast-formatted file...named after the id of the query seq...  

     my $result = $blast_report->next_result;

    my $blio = Bio::SearchIO->new( -file => ">".$query->id.".bls", -format => "blast" ) or die $!;

      $blio->write_result($result);



      # below, just looking at the current blast result

      while ( my $result = $blast_report->next_result ) {

        ## $result is a Bio::Search::Result::ResultI compliant object

        while ( my $hit = $result->next_hit ) {

          ## $hit is a Bio::Search::Hit::HitI compliant object

          while ( my $hsp = $hit->next_hsp ) {

            ## $hsp is a Bio::Search::HSP::HSPI compliant object

            if ( $hsp->length('total') > 50 ) {

              if ( $hsp->percent_identity >= 75 ) {

                print "Query= ", $result->query_name,

                  "Hit= ",        $hit->name,

                  "Length= ",     $hsp->length('total'),

                  "Percent_id= ", $hsp->percent_identity,

                  "Subject=",     $hsp->hit_string, "\n";

              }

            }

          }

        }

      }

    }

      ----- Original Message ----- 

      From: Smithies, Russell 

      To: 'Tim Koehler' ; 'maj at fortinbras.us' 

      Sent: Sunday, November 29, 2009 3:58 PM

      Subject: RE: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file



      Hi Tim

      With various people writing the “howtos” and other docs, the examples are bound to have differing names for the variables used but as long as you’re consistent, it should all fit together.



      I think I’ve almost got your code working, just getting errors from Bio::Search::Result::BlastResult  which I’m not entirely sure how to use. Perhaps Mark can get this bit going?



      --Russell

      ===============================



      use strict;

      use Bio::Tools::Run::StandAloneBlast;

      use Bio::SeqIO;

      use Bio::SearchIO;

      use Bio::Search::Result::BlastResult;



      use Data::Dumper;



      my $Seq_in = Bio::SeqIO->new( -file   => "sequences.fasta",

                                    -format => "fasta" );



      while ( my $query = $Seq_in->next_seq() ) {

             warn "Processing ",$query->id, "\n";

        my $factory =

          Bio::Tools::Run::StandAloneBlast->new(

                       program  => "blastn",

                       database => "/data/databases/flatfile/illuminati_blastdata/nt",

                       _READMETHOD => "Blast"

          );



        my $blast_report = $factory->blastall($query);

        sleep 5;





        my %hits_by_query;



             while ( my $result = $blast_report->next_result ) {

               foreach my $hit ( $result->hits ) {

                           warn "Pushed a hit for ",$hit->name, "\n";

                 push( @{ $hits_by_query{ $hit->name } }, $hit );

               }

             }



        foreach my $qid ( keys %hits_by_query ) {

                    warn "qid = $qid\n";

          my $res = Bio::Search::Result::BlastResult->new() or die $!;

          print Dumper $res;

          foreach my $h ( @{ $hits_by_query{$qid} } ){

                           warn "adding hit ", $h->name, "\n";

                           $res->add_hit($h) if defined($h);

                                 }

          my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format => "blast" ) or die $!;

          $blio->write_result($res);

        }



        while ( my $result = $blast_report->next_result ) {

          ## $result is a Bio::Search::Result::ResultI compliant object

          while ( my $hit = $result->next_hit ) {

            ## $hit is a Bio::Search::Hit::HitI compliant object

            while ( my $hsp = $hit->next_hsp ) {

              ## $hsp is a Bio::Search::HSP::HSPI compliant object

              if ( $hsp->length('total') > 50 ) {

                if ( $hsp->percent_identity >= 75 ) {

                  print "Query= ", $result->query_name,

                    "Hit= ",        $hit->name,

                    "Length= ",     $hsp->length('total'),

                    "Percent_id= ", $hsp->percent_identity,

                    "Subject=",     $hsp->hit_string, "\n";

                }

              }

            }

          }

        }

      }

      ===============================



      From: Tim Koehler [mailto:timbourine81 at googlemail.com] 
      Sent: Friday, 27 November 2009 10:24 p.m.
      To: Smithies, Russell; maj at fortinbras.us
      Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each queryinnew file



      Hey guys,

      please, do not get me wrong that I wanted to put the workload on you. So far I only found the HowTo's but in there in some way the language changed with time (e.g. $in to $Seq_in) or some things I simply could not find.
      Now I got a tip where else to search: the scrapbook and deobfuscator.

      I immediately will have a look at that.

      This is the first time for me touching linux / perl commands; that's why I thought after several days of trial and many errors ;) asking the mailinglist.

      I was very happy about your fast answers!

      Cheers and a nice weekend,

      Tim

      On Thu, Nov 26, 2009 at 5:02 PM, Tim Koehler <timbourine81 at googlemail.com> wrote:

      ups, sent too early...

      Hey Mark,

      thanks for the answer. But I am still struggling, especially where to put in your code.

      Here ist the code I have, so far:

      #!/usr/bin/perl -w

      ### should I put your code here as push is a perl command?


      my %hits_by_query;
      for ($result->hits) {

      ### I inserted a comma after name}}; if there is no comma, there was the error: Scalar found where operator expected at 12_BLAST_two_sequence_each_query_one_file.PL line7, near "} $hit"
      ###        (Missing operator before  $hit?)
      ###Useless use of push with no values at 12_BLAST_two_sequence_each_query_one_file.PL line 7.
      ###syntax error at 12_BLAST_two_sequence_each_query_one_file.PL line 7, near "} $hit"
      ###BEGIN not safe after errors--compilation aborted at 12_BLAST_two_sequence_each_query_one_file.PL line 13.


       push @{$hits_by_query{$hit->name}}, $hit;

      ###here, every time this terror appears: Name "main::result" used only once: possible typo at 12_BLAST_two_sequence_each_query_one_file.PL line 5.
      ###error: Can't call method "hits" on an undefined value at 12_BLAST_two_sequence_each_query_one_file.PL line 5.


      }


      use strict;
      use Bio::Tools::Run::StandAloneBlast;
      use Bio::SeqIO;
      use Bio::SearchIO;

      use Bio::Search::Result::BlastResult;

      my $Seq_in = Bio::SeqIO->new (
      -file => "/home/koehler/Programs/for_BLAST/BLAST_Pipeline/1_to_BLAST_two_seq.fasta",
      -format => 'fasta'
      );
      while (my $query = $Seq_in->next_seq()) {


      my $factory = Bio::Tools::Run::StandAloneBlast->new(

      'program' => 'blastn',
      'database' => '/home/koehler/Programs/for_BLAST/BLAST_Pipeline/3_BLAST_db',
      _READMETHOD => "Blast"
      );

      my $blast_report = $factory->blastall($query);

      ### Should I need to use a module? are the commands here at the right position? errors, e.g., Global symbol "$hit" requires explicit package name
      #my %hits_by_query;
      #for ($result->hits) {
      ### inserted comma after name}}
      # push @{$hits_by_query{$hit->name}}, $hit;
      #}



      foreach my $qid ( keys %hits_by_query ) {
       my $result = Bio::Search::Result::BlastResult->new();
       $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
       my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
       $blio->write_result($result);
      } 

      ###where are the files stored? what is their name. Sorry, but I cannot get behind that :(

      while( my $result = $blast_report->next_result ) {
        ## $result is a Bio::Search::Result::ResultI compliant object


        while( my $hit = $result->next_hit ) {

         ## $hit is a Bio::Search::Hit::HitI compliant object


         while( my $hsp = $hit->next_hsp ) {

          ## $hsp is a Bio::Search::HSP::HSPI compliant object
          if( $hsp->length('total') > 50 ) {
           if ( $hsp->percent_identity >= 75 ) {
           print  "Query= ",        $result->query_name,
              "Hit= ",        $hit->name,
                  "Length= ",     $hsp->length('total'),
                  "Percent_id= ", $hsp->percent_identity,
              "Subject=",        $hsp->hit_string,"\n";
           }
          }
         }
        }
      }
      }

      Again, a big thanks in advance :)

      All the best,

      Tim

      On Thu, Nov 26, 2009 at 4:52 PM, Tim <timbourine81 at gmail.com> wrote:

      Hey Mark,

      thanks for the answer




      On 25.11.2009 20:21, Mark A. Jensen wrote:
      > whoops: change the following line:
      > my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' );
      >
      > to
      >
      > my $blio = Bio::SearchIO->new( -file => ">$qid\.bls", -format=>'blast' );
      >
      > (I always forget that...)
      > MAJ
      >
      > ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
      > To: "Tim" <timbourine81 at gmail.com>; <bioperl-l at lists.open-bio.org>
      > Sent: Wednesday, November 25, 2009 1:20 PM
      > Subject: Re: [Bioperl-l] How to parse BLAST output - all hits of each
      > queryinnew file
      >
      >
      >> hey Tim--
      >>
      >> Sound like you need to go about collecting your queries inside out:
      >>
      >> my %hits_by_query;
      >> for ($result->hits) {
      >>  push @{$hits_by_query{$hit->name}} $hit;
      >> }
      >>
      >> I believe now each hash element, keyed by the query name, will contain
      >> an arrayref to the set of hits assoc with that query.
      >>> From here, I believe
      >>
      >> use Bio::Search::Result::BlastResult;
      >> use Bio::SearchIO;
      >>
      >> foreach my $qid ( keys %hits_by_query ) {
      >>  my $result = Bio::Search::Result::BlastResult->new();
      >>  $result->add_hit($_) for ( @{$hits_by_query{$qid}} );
      >>  my $blio = Bio::SearchIO->new( -file => $qid.".bls", -format=>'blast' );
      >>  $blio->write_result($result);
      >> }
      >>
      >> will do what you want.
      >>
      >> hope this helps -
      >> Mark
      >>
      >> ----- Original Message ----- From: "Tim" <timbourine81 at gmail.com>
      >> To: <bioperl-l at lists.open-bio.org>
      >> Sent: Wednesday, November 25, 2009 12:40 PM
      >> Subject: [Bioperl-l] How to parse BLAST output - all hits of each
      >> query innew file
      >>
      >>
      >>> Dear bioperl users,
      >>>
      >>> I am a real newbie and have - maybe a very trivial - question.
      >>>
      >>> I searched the mailing list archive and many howtos but I have not found
      >>> a concrete answer to my problem. So hopefully you can help me :)
      >>>
      >>> Background: I use the latest Bioperl version (installed it two weeks
      >>> before).
      >>> When I use Bio::Tools::Run::StandAloneBlast to BLAST one fasta file
      >>> including different sequences, I get a BLAST output with many queries
      >>> each having several hits / sbjcts.
      >>>
      >>> My problem is how to parse *all* hits of *one* query into a single new
      >>> file. And this for all the queries I have in my BLAST output file.
      >>>
      >>> Or is it better the other way round; first to make fasta files with only
      >>> single sequences inside and BLAST each file? But how can I automize that
      >>> using Bioperl?
      >>>
      >>> I tried Bio::SearchIO but can only parse all queries and their
      >>> respective hits in only one file...
      >>> I think iteration is also necessary here, but I do not really know how
      >>> to include that into Bio::SearchIO.
      >>> Or do I have to use Module:Bio::Index::Blast?
      >>>
      >>> I can index a file (see below), but I have no idea what comes next...
      >>>
      >>> ###How I index a file...
      >>>
      >>> #!/usr/bin/perl -w
      >>>
      >>> $ENV{BIOPERL_INDEX_TYPE} = "SDBM_File";
      >>>
      >>> use Bio::Index::Fasta;
      >>>
      >>>
      >>> $file_name = "8_to_BLAST_two_seq_index.fasta";
      >>> $id = "48882";
      >>> $inx = Bio::Index::Fasta->new (-filename => $file_name . ".idx",
      >>> -write_flag => 1);
      >>> $inx->make_index($file_name);
      >>>
      >>>
      >>> Hopefully, you can give me at least hints what to look for.
      >>>
      >>> A big THANKS in advance!
      >>>
      >>> Cheers,
      >>>
      >>> Tim
      >>> _______________________________________________
      >>> Bioperl-l mailing list
      >>> Bioperl-l at lists.open-bio.org
      >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
      >>>
      >>>
      >>
      >> _______________________________________________
      >> Bioperl-l mailing list
      >> Bioperl-l at lists.open-bio.org
      >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
      >>
      >>
      >

      Tim Köhler
      MPI for Terrestrial Microbiology
      Karl-von-Frisch-Straße
      D-35043 Marburg / Germany

      Email: koehlerd at mpi-marburg.mpg.de
      Phone: +49 6421 178-740
      Fax:   +49 6421 178-999






--------------------------------------------------------------------------

      Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.


--------------------------------------------------------------------------








More information about the Bioperl-l mailing list