Reverse Complement utility, Bio::Alg, return value problem

Steve A. Chervitz sac@genome.stanford.edu
Thu, 7 Aug 1997 15:43:20 -0700 (PDT)


SteveB wrote:
>
> > SteveC wrote:
> > Regarding the issue of methods that modify an existing object, 
> > I would argue that such methods should be flagged with a "set" prefix so   
> > it is absolutely clear what is being done: 
> > 
> > $myseq->set_revcom($beg,$end);  
> > 
> > would change the sequence object into its reverse complement. It could 
> > also return the altered object, too.
> > 
> > The advantages I see would be:
> > 
> > 1) One method call replaces three; set_revcom() would call inplace() for 
> >    you. 
> > 2) Objects are less likely to be inadvertantly altered (or not altered) 
> >    due to a missplaced or incorrect inplace() call. Requiring calls to 
> >    inplace(1) and inplace(0) forces the client to do the accounting and 
> >    thus can lead to a new class of bugs and maintenance headaches.
> >
> > A disadvantage would be having two methods (set_revcom() and revcom()) 
> > instead of one, which you would need to have for every accessor. 
> > But this is more in line with OO design. The inplace() calls would still 
> > be useful when performing complex, multi-step manipulations.  
> 
> Another disadvantage is extra typing for the potentially more common
> operations.
> 
> However, again, I agree with you.  I think that this is most likely to do
> the right thing in most cases.
> 
> Question: do you propose that set_revcom() also return the object ($self),
> or should the set_* functions return the modified (or previous,
> unmodified)  data?  I can see arguments for all three options.

Ah, we now open a new can of worms! 

A key motivation for deciding what to return concerns error handling. If  
the set fails, it's a good idea to halt further processing of the object. 
With this in mind, it might be safest for the set method to return true 
or false, depending on the success of the operation. This way, the 
following code will always work: 

if($myseq->set_revcom($beg,$end)) {
   analyze_seq($myseq->get_seq());
} else {
   warn "Can't set reverse complement.\n";
}

It's dangerous to return the object since you will get the wrong result 
if the set fails:

$myseq->set_revcom($beg,$end)->get_seq();  ## Returns original sequence
                                           ## if set_revcom() fails.

Consider this strategy as well:

$myseq->set_revcom($beg,$end);

if($myseq->valid()) {
   analyze_seq($myseq->get_seq());
} else {
   warn "Can't set reverse complement.\n";
}

In this scheme, the failed set_revcom(), in addition to returning 
false, would invalidate the object so it couldn't be used in ANY 
subsequent operation. This would be a way, for example, to prevent *and* 
signal calls to set_revcom() on protein sequences. (If you're calling  
revcom on protein sequences, something is way wrong with your code or 
your data!).

This degree of error checking is more important as the operations get 
more complex, such as when an object is responsible for creating a new  
type of object ($gene->set_protein() or $protein->set_blast()). In my 
objects, when a complex "set" fails, I actually generate and attach an 
internal exception to the object which contains data about the error. The 
process of generating this exception causes the set operation to return 
false. 

So I would favor having "set" functions (any function which can 
modify the object's data) return a status indicator and (possibly) being 
able to generate an exception that can invalidate the object. The issue 
of how to deal with exceptions is a separate issue. I don't think I have 
the best solution yet.

SteveC