[Biojava-dev] Contributing chromatogram support to BioJava
Rhett Sutphin
rhett-sutphin at uiowa.edu
Mon Mar 10 16:22:20 EST 2003
I sent this message this morning, but it got held for having
"suspicious headers" and hasn't been approved by the moderator. In the
hopes that the suspicious part was the attachment, I'm resending the
message without it.
----
Hi Matthew,
Thanks for the quick reply. I still have some questions.
On Monday, March 10, 2003, at 05:22 AM, Matthew Pocock wrote:
> back in the good old days, we made prety much everything public. Then
> we realised that was bad. unfortunately, the realy old packages have
> not been totaly spring-cleaned for cruftily exposed API. Implementing
> symbols propperly is hard, which is why we attempt to provide all the
> tools for creating your own without writing new classes. Hey ho.
I'm guessing from this that the reason you want to keep some things
package-level is to avoid them being "published API" and thereby avoid
being required to keep their interfaces stable. That could very well
be a good reason. On the other hand, making the Simple*Symbol classes
public and defining their APIs could make implementing symbols a lot
easier. For instance, subclassing from SimpleBasisSymbol I was able to
create a functioning BasisSymbol by creating a pair of alphabets and
then using them to fill in the SimpleBasisSymbol#symbols and
SimpleBasisSymbol#matches fields.
BTW, I think that the tools for creating and using Alphabets and
Symbols are well-thought and nicely documented. I just think that they
aren't sufficient for my needs in this case, as I'll explain in a
moment.
> Ok - so you want an alphabet that contains symbols that are a DNA
> nucleotide and an integer. You can do that with some variant of the
> following:
> <useful Alphabet creation/use examples snipped>
I did do this, but I did it in the context of defining an Alphabet for
this new type of BasisSymbol called BaseCall. The reason why I did
this instead of just defining the Alphabet and using getSymbol (as you
suggest) is twofold:
1) BaseCalls need to be annotatable (upon creation). SCFs, for
instance, contain seven quality values associated with each call. The
most natural way (to me) to associate those values with each base call
is through an Annotation. Is there another way that would be better?
2) I wanted to provide a way to get at the two halves of each base call
by name. That is, instead of doing:
Symbol basecall = chromat.getBaseCalls().get(3);
Symbol callDNA = basecall.getSymbols().get(1);
int callOffset = ((IntegerAlphabet.IntegerSymbol)
basecall.getSymbols().get(2)).intValue()
You could just do:
BaseCall basecall = (BaseCall) chromat.getBaseCalls().get(3);
Symbol callDNA = basecall.getNucleotide();
int callOffset = basecall.getOffset();
The problem I am most trying to avoid is requiring users of the class
to know that the first subsymbol of a base call is the nucleotide and
the second is the peak offset. It seems like that information should
be abstracted away. Since you suggested that subclassing is not the
way to go, I thought of an alternative. I could define a class call
ChromatogramTools and give it methods like these:
public static int getBaseCallOffset(Symbol basecall) throws
IllegalSymbolException;
public static Symbol getBaseCallNucleotide(Symbol basecall) throws
IllegalSymbolException;
Which would turn the example above into:
Symbol basecall = chromat.getBaseCalls().get(3);
try {
Symbol callDNA = ChromatogramTools.getBaseCallNucleotide(basecall);
int callOffset = ChromatogramTools.getBaseCallOffset(basecall);
} catch (IllegalSymbolException ise) {
throw new BioError(ise, "Can't happen unless there is a problem
with the chromatogram implementation");
}
The thing I don't like about the alternative method is that those
"tools" methods will have to throw IllegalSymbolExceptions since the
basecall parameter's type is just Symbol (and so might not be a member
of the base call alphabet). Therefore you have to wrap every
invocation of them in a try block, even though (with a well-behaved
Chromatogram implementation) you are guaranteed the exception won't be
thrown.
The basic OO-way to get around this is to have a strictly defined type
for the parameter -- that way the execution-time IllegalSymbolException
can be a compile-time error, instead.
So it seems to me that the best way to handle this is a
BasisSymbol-implementing class for BaseCalls. It is the only way I see
to handle these two issues. Do you have another suggestion?
Rhett
BTW: I've attached the code for BaseCall in case my prose argument
above wasn't clear.
--
Rhett Sutphin
Research Assistant (Software)
Coordinated Laboratory for Computational Genomics
and the Center for Macular Degeneration
University of Iowa - Iowa City, IA 52242 - USA
4111 MEBRF - email: rhett-sutphin at uiowa.edu
More information about the biojava-dev
mailing list