[Bioperl-l] Parsing Blast Output

Jason Stajich jason@cgt.mc.duke.edu
Tue, 11 Jun 2002 11:30:58 -0400 (EDT)


This module isn't very well documented as to the parameters.  Hopefully
this will start to happen as people use it and want to improve it.

You can get back 2 different types of objects, Bio::SearchIO object or a
Bio::Tools::BPlite object.  This is controlled by the parameter
'readmethod' by default in 1.0 and beyond it returns a Bio::SearchIO
object.  Previous versions may have returned a BPlite object by
default.

This may all be confusing because there are 2 different APIs
depending on which parser you are using.  Most development effort has
shifted to Bio::SearchIO as the vehicle of the future but we maintain
BPlite to avoid breaking old scripts or if people prefer its interface.

my @params = ( '-prog' => 'blastp',
               '-data' => 'nr',
               '-expect' => '0.001'
	       '-readmethod' => 'Blast' );

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
... same stuff from the SYNOPSIS to submit and check for a result
} else {
 $factory->remove_rid($rid);
 # Bio::SearchIO nomenclature
 my $result = $rc->next_result;
 print "db is ", $result->database_name(), "\n";
}

if you were using bplite by specifying '-readmethod' => 'BPlite' then
you'll need to use that API which has the 'database' method you are
calling.

The two different parsers are fundamentally different in that SearchIO is
designed to be pluggable to different formats and inherently supports
multiple reports in its API.  It is more of a kludge (but possible) to get
multiple concatenated reports to parse out of the BPlite interface.

HTH,
jason
On Tue, 11 Jun 2002 AUnderwood@PHLS.org.uk wrote:

> Hi Mat and Brian,
>
> Thank you for both your help. Together we're getting there. thanks.
>
> The code that Mat wrote was
>
>       while ( my @rids = $remote_blast_object->each_rid ) {
>         foreach my $rid ( @rids ) {
>           $rc = $remote_blast_object->retrieve_blast($rid);
>           "retrieving results...\n";
>           if( !ref($rc) ) {   # $rc not a reference => either error
>                               # or job not yet finished
>             if( $rc < 0 ) {
>               $remote_blast_object->remove_rid($rid);
>               print "Error return code for BlastID code $rid ... \n";
>             }
>             sleep 5;
>           } else {
>             $remote_blast_object->remove_rid($rid);
>             while ( my $sbjct = $rc->nextSbjct ) {
>               print "sbjct name is ", $sbjct->name, "\n";
>               while ( my $hsp = $sbjct->nextHSP ) {
>                 print "score is ", $hsp->score, "\n";
>               }
>
>
>
>
> This method doesn't seem to work with the BioPerl 1.0 release since it comes
> up with this message:
>
>
> Can't locate object method "database" via package "Bio::SearchIO::blast"
> (perhaps you forgot to load "Bio::SearchIO::blast"?) at blast_test4.pl line
> 34, <GEN14> line 228.
>
> So I never got the problems with the $result and $rc variables and an error
> that said
> Can't locate object method "next_result" via package "Bio::Tools::BPlite" at
> testremoteblast.pl line 32
>
> However from your code I think that I can use methods in the
> Bio::Search::HSP::HSPI module (which is the kind of object $hsp is) to
> generate the same kind of output as in your script
>
> $hsp->query_string
> $hsp->hit_string
> $hsp->homology_string.
>
> SO the code I 'm using is
>                  print "hit name is ", $hit->name, "\n";
>                  while( my $hsp = $hit->next_hsp ) {
>                  print "score is ", $hsp->score, "\n";
> 		 print "evalue is ", $hsp->evalue, "\n";
> 		 print $input->display_id,"\t\t", $hsp->query_string, "\n";
> 		 print "\t\t", $hsp->homology_string, "\n";
> 		 print $hit->description, "\t\t", $hsp->hit_string, "\n";
>
>
> Has this all changed again with Bioperl 1.01? I seem to understand that from
> Brian's message about the next_result as opposed to next_Sbjct. Will this
> all change again with 1.1??
> My one remaining problem is that
> $hsp->query_string/hit_string/homology-string generates very long strings if
> there is a long region of homology. Is there a way of formatting these in
> blocks of 50?
>
> Thanks again for all your help,
>
> Anthony
>
>
>
>
> **************************************************************************
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named addressee(s).
> It may not be disclosed to any other person without the express authority of
> the PHLS, or the intended recipient, or both. If you are not the intended
> recipient, you must not disclose, copy, distribute or retain this message or
> any part of it.
>
> For information on how to send data to the PHLS in encrypted form via
> E.Mail, visit www.phls.org.uk.
>
> This footnote also confirms that this EMail has been swept for computer
> viruses, but please re-sweep any attachments before opening or saving.
>
> HTTP://www.phls.org.uk
> **************************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu