[Bioperl-l] Sorting BLAST Output

Terry Jones tc.jones at jones.tc
Tue Apr 5 17:00:34 EDT 2005


Just a quick comment on this:

| One 'easy' way to do this is to build an array of hashes with the hits
| and whatever feature you are interested.  It's a pure perl
| implementation. I don't think the API for the Bioperl search result
| object supports the sorting you want to do, but I could be wrong.
| 
| my @hashes;
| for my $hit (@your_hits) {
|      my $len     = get_aln_len($hit);
|      my $num_mis = get_num_mis($hit);
|      push @hashes, { hit => $hit, len => $len, num_mis => $num_mis };
| }
| 
| my @sorted = sort by_len_and_num_mis @hashes;
| 
| sub by_len_and_num_mis {
|      $a->{len} <=> $b->{len} ||
|      $a->{num_mis} <=> $b->{num_mis}
| }

In the end, @sorted is a sorted array of references to hashes, which
is maybe not what you were expecting. You can get at the things in
that array via e.g.,

    for my $hit (@sorted){
        print "$hit->{hit}\n";
    }

Andrew was likely writing nice and understandable code for you, which
is good (of course). It would be a bit faster to use an anonymous
array rather than an anonymous hash. The @hashes array is left lying
around, so you might want to undef it if you're not doing this in a
subroutine.

Also, the above can be done using a Schwartzian Transform, which is
more concise and far more cryptic (see google):

my @sorted = map { $_->[0] }
             sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
             map { [ $_, get_aln_length($_), get_num_mis($_) ] }
             @your_hits;

This also leaves you with what you were likely expecting, an array of
your original hits.

If you're into perl programming, it's really worth getting your head
around the Schwartzian Transform. Once you understand it, it's easy to
write compact solutions to lots of problems like this. This processing
is a lot like lisp. Unfortunately, it looks like line noise.

Terry



More information about the Bioperl-l mailing list