[Biopython] quantile normalization method
Vincent Davis
vincent at vincentdavis.net
Sat Mar 20 17:16:37 UTC 2010
@Bartek Wilczynski
Could you test the following code against R, speed and acuracy? I am using
numpy so you will need to; import numpy as np
I did not find any clear documentation as to if the* Bolstad method
or quantile normalization methods in general are dropping outliers. Any
input here would be great.*
I also have to thank Anne Archibald on the scipy mailing list for the fancy
array indexing help.
def quantile_normalization(anarray):
"""
anarray with samples in the columns and probes across the rows
import numpy as np
"""
A=anarray
AA = np.zeros_like(A)
I = np.argsort(A,axis=0)
AA[I,np.arange(A.shape[1])] =
> np.mean(A[I,np.arange(A.shape[1])],axis=1)[:,np.newaxis]
return AA
*Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>
On Sat, Mar 20, 2010 at 1:55 AM, Bartek Wilczynski <bartek at rezolwenta.eu.org
> wrote:
> On Sat, Mar 20, 2010 at 4:56 AM, Vincent Davis <vincent at vincentdavis.net>wrote:
>
>> Is there a quantile normalization method in biopython, I search but did
>> not
>> find. If not it looks straight forward would it be of any interest to the
>> community for me to contribute a method
>>
>> 1. given n arrays of length p, form X of dimension
>> p × n where each array is a column;
>> 2. sort each column of X to give X sort ;
>> 3. take the means across rows of X sort and assign this
>> mean to each element in the row to get X sort ;
>> 4. get X normalized by rearranging each column of
>> X sort to have the same ordering as original X
>>
>> From
>> A comparison of normalization methods for high
>> density oligonucleotide array data based on
>> variance and bias
>> B. M. Bolstad 1,∗, R. A. Irizarry 2, M. Astrand 3 and T. P. Speed 4, 5
>> ˚
>>
>
> Hi,
>
> I don't think there is such a method available.
>
> I'm myself using the original R implementation by Bolstad et al. It
> requires rPy and R installed. It can be achieved in a few lines of code:
>
> <pre>
> import rpy2.robjects as robjects
> #ll = list of concatenated values to normalize
> v = robjects.FloatVector(ll)
> #numrows=number of vectors that made up ll
> m = robjects.r['matrix'](v, nrow = numrows, byrow=True)
> robjects.r('require("preprocessCore")')
> normq=robjects.r('normalize.quantiles')
> norm_a=numpy.array(normq(m))
> #norm_a=normalized array
> </pre>
>
> If your method is a pure python implementation which is comparably fast I
> think it would be worth to have it in Biopython since the method is (in my
> opinion) quite useful and it would remove the dependency on R from some of
> my scripts.
>
> cheers
> Bartek
>
More information about the Biopython
mailing list