[Biopython] quantile normalization method

Sat Mar 20 17:16:37 UTC 2010

@Bartek Wilczynski
Could you test the following code against R, speed and acuracy? I am using
numpy so you will need to; import numpy as np

I did not find any clear documentation as to if the* Bolstad method
or quantile normalization methods in general are dropping outliers. Any
input here would be great.*

I also have to thank Anne Archibald on the scipy mailing list for the fancy
array indexing help.

def quantile_normalization(anarray):

        """

        anarray with samples in the columns and probes across the rows

        import numpy as np

        """

        A=anarray

        AA = np.zeros_like(A)

        I = np.argsort(A,axis=0)

        AA[I,np.arange(A.shape[1])] =
> np.mean(A[I,np.arange(A.shape[1])],axis=1)[:,np.newaxis]

        return AA

  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>

On Sat, Mar 20, 2010 at 1:55 AM, Bartek Wilczynski <bartek at rezolwenta.eu.org
> wrote:

> On Sat, Mar 20, 2010 at 4:56 AM, Vincent Davis <vincent at vincentdavis.net>wrote:
>
>> Is there a quantile normalization method in biopython, I search but did
>> not
>> find. If not it looks straight forward would it be of any interest to the
>> community for me to contribute a method
>>
>> 1. given n arrays of length p, form X of dimension
>> p × n where each array is a column;
>> 2. sort each column of X to give X sort ;
>> 3. take the means across rows of X sort and assign this
>> mean to each element in the row to get X sort ;
>> 4. get X normalized by rearranging each column of
>> X sort to have the same ordering as original X
>>
>> From
>> A comparison of normalization methods for high
>> density oligonucleotide array data based on
>> variance and bias
>> B. M. Bolstad 1,∗, R. A. Irizarry 2, M. Astrand 3 and T. P. Speed 4, 5
>> ˚
>>
>
>  Hi,
>
> I don't think there is such a method available.
>
> I'm myself using the original R implementation by Bolstad et al. It
> requires rPy and R installed. It can be achieved in a few lines of code:
>
> <pre>
> import rpy2.robjects as robjects
> #ll = list of concatenated values to normalize
> v = robjects.FloatVector(ll)
> #numrows=number of vectors that made up ll
> m = robjects.r['matrix'](v, nrow = numrows, byrow=True)
> robjects.r('require("preprocessCore")')
> normq=robjects.r('normalize.quantiles')
> norm_a=numpy.array(normq(m))
> #norm_a=normalized array
>  </pre>
>
> If your method is a pure python implementation which is comparably fast I
> think it would be worth to have it in Biopython since the method is (in my
> opinion) quite useful and it would remove the dependency on R from some of
> my scripts.
>
> cheers
>  Bartek
>