[Bioperl-l] Re: issues with _rearrange

Tim Bunce Tim.Bunce@pobox.com
Fri, 20 Sep 2002 17:39:19 +0100


On Fri, Sep 20, 2002 at 04:19:42PM +0100, Ewan Birney wrote:
> On Fri, 20 Sep 2002, Aaron J Mackey wrote:
> 
> <<snip>>
> 
> Can I propose an
> 
>   (F) - Write a specific constructor, called like:
> 
>    $feature = Bio::SeqFeature::Generic->seqio_direct_new($start,$end,$tag_hash_ref);
> 
>    which is a "seqio optimised" constructor. The idea here is that SeqIO 
> is allowed to "take a different route" from standard constructors and 
> seqio_direct_new is allowed to do all manner of speed-magic - including 
> filling in the hash directly outside of the get/set methods and - for 
> example - directly using the tag_hash_ref built in the FTHelper object as 
> the tag hash rather than adding things
> 
>    Ditto perhaps for PrimarySeq.
> 
>    Basically - can we write "direct_new" which are speed-optimised at the 
> price of strict calling conventions and "insider knowledge"
> 
>    (caveat - how much stuff does Root::Object need intialised before it is 
> happy, and if so... why...)

Seems like a good idea.

But meanwhile here's a version of _rearrange that's over 60% faster:

Benchmark: running new, old, each for at least 5 CPU seconds...
       new:  6 wallclock secs ( 5.25 usr +  0.02 sys =  5.27 CPU) @ 17193.72/s (n=90670)
       old:  6 wallclock secs ( 5.22 usr +  0.02 sys =  5.23 CPU) @ 11295.71/s (n=59126)

use strict;
use Benchmark;
use Data::Dumper;
 
my $order = [qw(SEQUENCE ID FOOBAR DESC)];
my (@new, @old);
 
timethese(-5, {
        new => sub { @new=new_rearrange(undef,$order, -sequence=>41, -desc=>42, -id=>43) },
        old => sub { @old=old_rearrange(undef,$order, -sequence=>41, -desc=>42, -id=>43) },
});
warn Dumper([\@old,\@new]);
 
sub new_rearrange {
    my (undef, $order) = (shift,shift);
    return @_ unless ($#_ % 2 && substr($_[0]||'',0,1) eq '-');
    my %param;
    while (@_) {
        (my $key = shift) =~ tr/a-z\055/A-Z/d; # deletes leading (and all) dashes
        $param{ $key } = shift;
    }
    return @param{@$order};
}
 
sub old_rearrange {
    my($self,$order,@param) = @_;
    return @param unless (defined($param[0]) && $param[0]=~/^-/o && ($#param % 2));
    for (my $i=0;$i<@param;$i+=2) {
        $param[$i]=~s/^\-//;
        $param[$i]=~tr/a-z/A-Z/;
    }
    my(%param) = @param;
    return @param{@{$order}};
}

I've not had time to work on tuning up other parts of the code (and
may not for at least several days) but from a quick read through
there's lots of scope for optimizations.

Tim.