[Bioperl-l] Re: [Bioperl-guts-l] Notification: incoming/1216 (fwd)

Jason Stajich jason@cgt.mc.duke.edu
Fri, 21 Jun 2002 09:41:55 -0400 (EDT)


On Fri, 21 Jun 2002 BHurwitz@twt.com wrote:

>
> Hi Jason,
>
> Thank you so much for your reply.  I took a closer look at the output from
> blastn and megablast and the only difference is that the order of three
> lines are switched around at the top of the hit result.  The information is
> the same but the order is reversed.  It seems to me that the output from
> blastall, whether it be blast or megablast should be the same format.  I
> just wrote to NCBI and asked them if they were planning on changing the
> format of output from megablast to match blastn.  Depending on their reply,
> I can write a new parser for megablast switching the order, but I hope they

Don't hold your breath - it is probably easier for us to fix it on our end
if it is really as simple as detecting the two - note that we can't
parse blastcl3 output becuase it (again) deviates from their
established BLAST format.

I'm tracking differences in the different XML tag output from different
versions of BLAST so will take a look at this as well.

> can fix it on their end since it seems kind of silly to have slightly
> different output format from the same program (blastall).  Below is an
> example of the lines that are switched in the output.
>
> MEGABLAST OUTPUT (top):
>
> Database: /usr/local/bin/blast/db/chr22.fa
>            1 sequences; 47,748,585 total letters
>
> Searching.done
> Query= 340113
>          (45 letters)
>
>
> BLASTN OUTPUT (top):
>
> Query= 340113
>          (45 letters)
>
> Database: /usr/local/bin/blast/db/chr22.fa
>            1 sequences; 47,748,585 total letters
>
> Searching.done
>
>
> PS.  I just started using bioperl and so far I love it!  One question, I
> noticed that the "--n" option in blastall (for megablast) is not supported
- it is an easy add - you just put it in the StandAloneBlast

just just add a 'n' to the @BLASTALL_PARAMS, so in your code:
use Bio::Tools::Run::StandAloneBlast;

push @Bio::Tools::Run::StandAloneBlast::BLASTALL_PARAMS, 'n';

I guess there is no reason we can't add it to the source file itsself if
indeed this works.


If you are interested in wrapping around megablast would need to create a
megablast() method and the corresponding @MEGABLAST_PARAMS array to make


> in StandAloneBlast.pm, are you planning on supporting this?  Do you know
> where I can request features to be added?
>

> Thanks!
> Bonnie Hurwitz
>
>
>
>
>                     Jason Stajich
>                     <jason@cgt.mc        To:     Bioperl <bioperl-l@bioperl.org>
>                     .duke.edu>           cc:     <BHurwitz@twt.com>
>                                          Subject:     [Bioperl-guts-l] Notification: incoming/1216 (fwd)
>                     06/20/2002
>                     09:19 PM
>
>
>
>
>
>
> We have no megablast parser - active development effort for search result
> parsers should be focused on new Bio::SearchIO plugins - I'm not sure if
> the format is different enough to warrant a new module or if we can adapt
> the Bio::SearchIO::blast one to read megablast too.
>
> At any rate - there is currently no parser but we'll help you make it
> bioperl friendly if you or someone else want to write it.
>
> -jason
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
> ---------- Forwarded message ----------
> Date: Wed, 19 Jun 2002 10:34:56 -0400
> From: bioperl-bugs@bioperl.org
> To: bioperl-guts-l@bioperl.org
> Subject: [Bioperl-guts-l] Notification: incoming/1216
>
> JitterBug notification
>
> new message incoming/1216
>
> Message summary for PR#1216
>            From: BHurwitz@twt.com
>            Subject: BPlite - parsing megablast data
>            Date: Wed, 19 Jun 2002 10:40:25 -0500
>            0 replies           0 followups
>
> ====> ORIGINAL MESSAGE FOLLOWS <====
>
> >From BHurwitz@twt.com Wed Jun 19 10:34:55 2002
> Received: from venus.twt.com (one-six-five.invadercreator.com
> [208.137.93.165])
>            by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id
> g5JEYsal018823
>            for <bioperl-bugs@bio.perl.org>; Wed, 19 Jun 2002 10:34:55 -0400
> Subject: BPlite - parsing megablast data
> To: "bioperl-bugs@bio.perl.org" <bioperl-bugs@bio.perl.org>
> X-Mailer: Lotus Notes Release 5.0.5  September 22, 2000
> Message-ID: <OF1D94E9A5.79CE0D2C-ON86256BDD.00550E9D@twt.com>
> From: BHurwitz@twt.com
> Date: Wed, 19 Jun 2002 10:40:25 -0500
> X-MIMETrack: Serialize by Router on TWTMSNM1/TWT(Release 5.0.8 |June 18,
> 2001) at 06/19/2002
>  10:40:40 AM
> MIME-Version: 1.0
> Content-type: text/plain; charset=us-ascii
>
> Hello,
>
> I am not able to parse megablast data with the BPlite BLAST parser.  I
> suspect that megablast output was not factored into the parser when it was
> written since my parsing script works fine for regular blast data.  Will
> megablast parsing capabilities be available in BPlite?
>
> * error message when running BPlite on megablast output.
>
> $ ./blast_parser.pl megablast.out megablast.parsed
> -------------------- WARNING ---------------------
> MSG: Possible error (1) while parsing BLAST report!
> ---------------------------------------------------
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.6.1/Bio/Tools/BPlite.pm line 336, <GEN0> line
> 131.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.6.1/Bio/Tools/BPlite.pm line 337, <GEN0> line
> 131.
> Use of uninitialized value in substitution (s///) at
> /usr/lib/perl5/site_perl/5.6.1/Bio/Tools/BPlite.pm line 338, <GEN0> line
> 131.
> Use of uninitialized value in pattern match (m//) at
> /usr/lib/perl5/site_perl/5.6.1/Bio/Tools/BPlite.pm line 340, <GEN0> line
> 131.
>
>
> * example code
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::Tools::BPlite;
>
> ###############################################
> # blast_parser.pl
> # written by B. Hurwitz
> # 6/19/02
> # This program parses blast output.
> ################################################
>
> if (@ARGV != 2) { die "Usage: blast_parser.pl blast.in blast.parsed\n"; }
> my $blast_in = $ARGV[0];
> my $outfile = $ARGV[1];
>
> open (OUT, ">$outfile") || die "Cannot open $outfile \n";
>
> # main program
> my $report = new Bio::Tools::BPlite(-file => $blast_in);
> {
> print OUT "> ", $report->query, "\t", $report->database, "\n";
>    while (my $subject = $report->nextSbjct()) {
>       while (my $hsp = $subject->nextHSP()) {
>          print OUT join ("\t",
>                      $hsp->score, #blast score
>                      $hsp->bits, #number of bits in blast score
>                      $hsp->percent, #percent identity
>                      $hsp->P, #evalue
>                      $hsp->length, #length of the hsp
>                      $hsp->query->start, #starting bp of the query
>                      $hsp->query->end, #ending bp of the query
>                      $hsp->hit->start, #starting bp of the hsp
>                      $hsp->hit->end, #ending bp of the hsp
>                      $hsp->hit->seqname), "\n"; # name of the hsp
>       } #end of while hsp
>    } #end of while subject
> # the following line takes you to the next report in the stream/file
> # it will return 0 if that report is empty,
> # but that is valid for an empty blast report.
> # Returns -1 for EOF.
>    last if ($report->_parseHeader == -1);
>    redo;
> }
>
>
>
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-guts-l
>
>
>
>
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu