[Bioperl-l] results problem with StandAloneBlast

Chris Fields cjfields at uiuc.edu
Thu Jun 1 21:36:05 UTC 2006


Genevieve, 

seek() won't work here; all the file IO is handled through Bio::Root::IO
methods.  The SearchIO system is set up like an XML SAX parser so if you
want to save objects as they come you'll have to store the object refs in an
array, like so:

my @hsps;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      while ($hsp = $hit->next_hsp) {
         push @hsps, $hsp;
      }
   }
}

Or similarly with hits: 

my @hits;

while ($result = $parser->next_result) {
   while ($hit = $result->next_hit) {
      push @hits, $hit;
   }
}

Or you could use more complex data structures (array of arrays) as Sendu
suggested.  You should be able to sort like anything else by calling methods
within the sort:

# total number of hsps
my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;

# if you really like your accessions in alphabetical order
my @sorted = sort {$a->accession cmp $b->accession} @hits;

Then if you wanted to print later you could sort based on something else,
like the score:

my @sort_score = sort {$a->score <=> $b->score} @hits;

So you would end up with something like the following subroutines:

sub sort_results{
   my $report = shift;
   while($result = $report->next_result()){
      while(my $hit = $result->next_hit()){
         push @hits, $hit;
      }
   }
   my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits;
   print $_->accession,"\t",$_->num_hsps,"\n" for @sorted;
}

sub print_blast_results{
   my $report = shift;
   my @sort_score = sort {$a->score <=> $b->score} @hits;
   for my $h (@sort_score) {
      while (my $hsp = $h->next_hsp) {
         # might use something else here like hit->name or accession,
         # not sure what you want
         my $q_name = $hsp->seq_id; 
         print join(", ",$q_name,$h->name,$hsp->bits)."\n";
         }
   }
}


Just so you know, I couldn't get display_id or display_name to work when
using the Bio::Search::HSP::GenericHSP object.  Your results may vary.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Genevieve DeClerck
> Sent: Thursday, June 01, 2006 2:11 PM
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] results problem with StandAloneBlast
> 
> Problem solved, albeit, in a slightly hacky way.
> 
> I tried to make seek() work for a good long while with the SearchIO
> blast results object, but I just couldn't get it to work. (Probably b/c
> seek wants to see a genuine file handle-- not a SearchIO filehandle.) I
> used SearchIO's fh() to get the handle and could while(<$fh>) through
> the data but when I used seek($fh,0,0) to reset the cursor position in
> the handle in prep for another loop, i got an error complaining about my
> use of seek() by indicating that "SEEK" could not be found in Seekable.pm.
> 
> I concluded that it was not going to be possible and instead made an
> array if SeqFeature objects which contain all the relevant blast output
> data (i.e. the m8/hit table stuff).
> 
> It still seems unfortunate that one can't reuse the SearchIO object for
> cases when the SearchIO blast report needs to be accessed mltiple times.
> 
> Thanks for your help,
> Genevieve
> 
> 
> 
> Sendu Bala wrote:
> 
> > Genevieve DeClerck wrote:
> >
> >>Thanks for your comment Sendu, it was very helpful. I think this must be
> >>what's going on.. I am using $blast_report->next_result in both
> >>subroutines. It appears that analyzing the blast results first w/ my
> >>sort subroutine empties (?) the $blast_result object so that when I try
> >>to print, there is nothing left to print. (and visa-versa when I print
> >>first then try to sort).
> >>So, from the looks of things, using next_result has the effect of
> >>popping the Bio::Search::Result::ResultI objects off of the SearchIO
> >>blast report object??
> >
> >
> > Not quite. It's more or less exactly like opening a file and then trying
> > to read it all twice like this:
> > open(FILE, "file");
> > while (<FILE>) {
> >      print # prints each line in the file
> > }
> > while (<FILE>) {
> >      print # never happens, we never enter this while loop
> > }
> >
> > To get the second while loop to print anything we need to say seek(FILE,
> > 0, 0) before it. Or in the first while loop store each line in an array,
> > and then make the second loop a foreach through that array.
> >
> >
> >
> >>It seems I could get around this by making a copy of the blast report by
> >>setting it to another new variable...(not the most elegant solution) but
> >>I'm having trouble with this...
> >>
> >>If I do:
> >>
> >>    my $blast_report_copy = $blast_report;
> >>
> >>I'm just copying the reference to the SearchIO blast result, so it
> >>doesn't help me. How can I make another physical copy of this blast
> >>result object? Seems like a simple thing but how to do it is escaping
> me.
> >
> >
> > Not really a good idea, and it may not work anyway if the object
> > contains a filehandle. But for a simple object you might recursively
> > loop through the data structure and copy each element out into a similar
> > data structure.
> >
> >
> >
> >>But better yet, the way to go is to 'reset the counter,' or to find a
> >>way to look at/print/sort the results without removing data from the
> >>blast result object. How is this done though??
> >
> >
> > It would be rather nice if this worked:
> > my $blast_report = $factory->blastall($ref_seq_objs);
> > my $blast_fh = $blast_report->fh();
> > while (<$blast_fh>) {
> >      # $_ is a ResultI object, use as normal
> > }
> > seek($blast_fh, 0, 0); # this would be great, but does it work?
> > while <$blast_fh>) {
> >      # go through the results again in your second subroutine
> > }
> >
> > An alternative hacky way of doing it, which may also not work, would be
> > to go through your $blast_report as normal, but then before going
> > through it a second time, say
> > my $fh = $blast_report->_fh;
> > seek($fh, 0, 0);
> >
> > Finally, the most sensible way (assuming bioperl provides no methods of
> > its own for this) of solving the problem is, the first time you go
> > through each next_result, next_hit and next_hsp, just store the returned
> > objects in an array of arrays of arrays. Then the second time get the
> > objects from your array structure instead of with the method calls.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list