[Biojava-dev] Count.java, Distribution.java and Alignment.java

Schreiber, Mark mark.schreiber at agresearch.co.nz
Tue Feb 25 11:40:33 EST 2003



> -----Original Message-----
> From: Lachlan Coin [mailto:lc1 at sanger.ac.uk] 
> Sent: Thursday, 13 February 2003 11:42 p.m.
> To: biojava-dev at biojava.org
> Subject: [Biojava-dev] Count.java, Distribution.java and 
> Alignment.java
> 
> 
> I just had a few comments about these interfaces, which would 
> make them easier/more efficient for me to use.
> 
> It would be great if both these interfaces enforced a 
> nonZeroSymbols() method which returned the set of symbols 
> have a non-zero count / probability respectively.  
> Particularly if you are working with sparse counts over high 
> dimensional cross-product alphabets, it seems pretty 
> inefficient to iterate through all the members of a 
> cross-product alphabet when only a  small fraction of these 
> have counts.  This also relates to storage - it would be good 
> to have a DistributionFactory that could create sparse distributions.
> 

My vote would be to add two static nonZeroSymbols methods to
DistributionTools (one over Distributions the other over counts). This
avoids breaking the API but makes the functionality available.

> Also, this is more minor, but Count uses doubles rather than 
> integers, which is certainly more flexible, but would seem to 
> take more memory.  Is this flexibility needed - isn't 
> Distribution supposed to be for this?
> 

I use the double for resovling the counts of ambiguity symbols (which
Count formally doesn't count). Eg, if I am doing a count over DNA and I
see an n I add 0.25 to each symbol.

> 
> Finally, in Alignment.java, there are two methods, which use 
> inconsitent container classes for the labels of the alignment.
> 
> java.util.List getLabels()
> 
> Alignment subAlignment(java.util.Set labels, Location loc)
> 
> so that to get a subAlignment over all labels, you have to 
> convert a List to a Set.
> 

I think they should both be Sets.

> 
> Thanks,
> 
> Lachlan
> 
> -------------------------------------------------------------
> Lachlan Coin
> Wellcome Trust Sanger Institute		Magdalene College
> Cambridge  CB10 1SA			Cambridge CB30AG
> Ph: +44 1223 494 820
> Fax: +44 1223 494 919
> ------------------------------------------------------------
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org 
> http://biojava.org/mailman/listinfo/biojava-dev
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the biojava-dev mailing list