Bioperl: BPlite.pm

Ewan Birney birney@sanger.ac.uk
Wed, 22 Dec 1999 09:16:28 +0000 (GMT)


On Tue, 21 Dec 1999, Ian Korf wrote:

> I've been getting requests recently for old BLAST parsers.
> Seems as though some people are looking for a lighweight
> parser. At http://sapiens.wustl.edu/~ikorf/BPlite.pm you
> can find my version of such a module. It parses both NCBI-
> and WU-BLAST, and works well in pipes since it reads one
> subject and one alignment at a time.

I'd really like to see a lighter blast parser with less embedded
functionality in bioperl, ideally with the main features of steve's
blast parser. If I can persuade someone to look at this Ian, is it
ok to bring it inside bioperl? (any chance of you wanting to do that? I
guess not...)

Steve - we *do* need to think of upgrading the blast parser - only
you know the code, and the largest set of bugs are found in it.


> 
> The pod2text version of the documentation follows.
> 
> -Ian Korf
> 
> 
> NAME
>     BPlite - Lightweight BLAST parser
> 
> SYNOPSIS
>      use BPlite;
>      my $report = new BPlite(\*STDIN);
>      $report->query;
>      $report->database;
>      while(my $sbjct = $report->nextSbjct) {
>          $sbjct->name;
>          while (my $hsp = $sbjct->nextHSP) {
>              $hsp->score;
>              $hsp->bits;
>              $hsp->percent;
>              $hsp->P;
>              $hsp->queryBegin;
>              $hsp->queryEnd;
>              $hsp->sbjctBegin;
>              $hsp->sbjctEnd;
>              $hsp->queryAlignment;
>              $hsp->sbjctAlignment;
>          }
>      }
> 
> DESCRIPTION
>     BPlite is a package for parsing BLAST reports. The BLAST
>     programs are a family of widely used algorithms for sequence
>     database searches. The reports are non-trivial to parse, and
>     there are differences in the formats of the various flavors of
>     BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and
>     TBLASTX reports from both the high performance WU-BLAST, and the
>     more generic NCBI-BLAST.
> 
>     Many people have developed BLAST parsers (I myself have made at
>     least three). BPlite is for those people who would rather not
>     have a giant object specification, but rather a simple handle to
>     a BLAST report that works well in pipes.
> 
>   Object
> 
>     BPlite has three kinds of objects, the report, the subject, and
>     the HSP. To create a new report, you pass a filehandle reference
>     to the BPlite constructor.
> 
>      my $report = new BPlite(\*STDIN); # or any other filehandle
> 
>     The report has two attributes (query and database), and one
>     method (nextSbjct).
> 
>      $report->query;     # access to the query name
>      $report->database;  # access to the database name
>      $report->nextSbjct; # gets the next subject
>      while(my $sbjct = $report->nextSbjct) {
>          # canonical form of use is in a while loop
>      }
> 
>     A subject is a BLAST hit, which should not be confused with an
>     HSP (below). A BLAST hit may have several alignments associated
>     with it. A useful way of thinking about it is that a subject is
>     a gene and HSPs are the exons. Subjects have one attribute
>     (name) and one method (nextHSP).
> 
>      $sbjct->name;    # access to the subject name
>      "$sbjct";        # overloaded to return name
>      $sbjct->nextHSP; # gets the next HSP from the sbjct
>      while(my $hsp = $sbjct->nextHSP) {
>          # canonical form is again a while loop
>      }
> 
>     An HSP is a high scoring pair, or simply an alignment. HSP
>     objects do not have any methods, just attributes (score, bits,
>     percent, P, queryBegin, queryEnd, sbjctBegin, sbjctEnd,
>     queryAliignment, sbjctAlignment) that should be familiar to
>     anyone who has seen a blast report. For lazy/efficient coders,
>     two-letter abbreviations are available for the attributes with
>     long names (qb, qe, sb, se, qa, sa).
> 
>      $hsp->score;
>      $hsp->bits;
>      $hsp->percent;
>      $hsp->P;
>      $hsp->queryBegin;     $hsp->qb;
>      $hsp->queryEnd;       $hsp->qe;
>      $hsp->sbjctBegin;     $hsp->sb;
>      $hsp->sbjctEnd;       $hsp->se;
>      $hsp->queryAlignment; $hsp->qa;
>      $hsp->sbjctAlignment; $hsp->sa;
>      "$hsp"; # overloaded for begin..end bits
> 
>     I've included a little bit of overloading for double quote
>     variable interpolation convenience. A subject will return its
>     name and an HSP will return its queryBegin, queryEnd, and bits
>     in the alignment. Feel free to modify this to whatever is most
>     frequently used by you.
> 
>     So a very simple look into a BLAST report might look like this.
> 
>      my $report = new BPlite(\*STDIN);
>      while(my $sbjct = $report->nextSbjct) {
>          print "$scbjct\n";
>          while(my $hsp = $sbjct->nextHSP) {
>                     print "\t$hsp\n";
>          }
>      }
> 
>     The output of such code might look like this:
> 
>      >foo
>          100..155 29.5
>          268..300 20.1
>      >bar
>          100..153 28.5
>          265..290 22.1
> 
> AUTHOR
>     Ian Korf (ikorf@sapiens.wustl.edu,
>     http://sapiens.wustl.edu/~ikorf)
> 
> ACKNOWLEDGEMENTS
>     This software was developed at the Genome Sequencing Center at
>     Washington Univeristy, St. Louis, MO.
> 
> COPYRIGHT
>     Copyright (C) 1999 Ian Korf. All Rights Reserved.
> 
> DISCLAIMER
>     This software is provided "as is" without warranty of any kind.
> 
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 

-----------------------------------------------------------------
Ewan Birney. Work: +44 (0)1223 494992. Mobile: +44 (0)7970 151230
<birney@sanger.ac.uk>
http://www.sanger.ac.uk/Users/birney/
-----------------------------------------------------------------

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================