[Biojava-dev] Count.java, Distribution.java and Alignment.java

Thu Feb 13 10:42:29 EST 2003

I just had a few comments about these interfaces, which would make them
easier/more efficient for me to use.

It would be great if both these interfaces enforced a nonZeroSymbols()
method which returned the set of symbols have a non-zero count /
probability respectively.  Particularly if you are working with sparse
counts over high dimensional cross-product alphabets, it seems pretty
inefficient to iterate through all the members of a cross-product
alphabet when only a  small fraction of these have counts.  This also
relates to storage - it would be good to have a DistributionFactory that
could create sparse distributions.

Also, this is more minor, but Count uses doubles rather than integers,
which is certainly more flexible, but would seem to take more memory.  Is
this flexibility needed - isn't Distribution supposed to be for this?

Finally, in Alignment.java, there are two methods, which use
inconsitent container classes for the labels of the alignment.

java.util.List getLabels()

Alignment subAlignment(java.util.Set labels, Location loc)

so that to get a subAlignment over all labels, you have to convert a List
to a Set.

Thanks,

Lachlan

-------------------------------------------------------------
Lachlan Coin
Wellcome Trust Sanger Institute		Magdalene College
Cambridge  CB10 1SA			Cambridge CB30AG
Ph: +44 1223 494 820
Fax: +44 1223 494 919
------------------------------------------------------------