[Biopython] quantile normalization method
Laurent Gautier
lgautier at gmail.com
Sat Mar 20 18:05:42 UTC 2010
Hi Bartek and Vincent,
Few comments:
A/
The algorithm is fairly straightforward, as you noted it, but beware of
details such missing values, ability to normalize against a target
distribution, or ties when ranking (although I'd have to check if those
receive a special treatment).
The quantile normalization code in the R package "preprocessCore" is in
C and might outperform a pure Python implementation.
B/
There is a variety of normalization methods in bioconductor, and it
might make sense to embrace it as a dependency (rather than reimplement
it). I have bindings for Bioconductor up my sleeve about to be
distributed to few people for testing. The public release might be
around ISMB, BOSC time.
C/
norm_a = numpy.array(normq(m))
can be replaced by
norm_a = numpy.as_array(normq(m))
to improve performances whenever m is of substantial size (as no copy is
made - see
http://rpy.sourceforge.net/rpy2/doc-2.1/html/numpy.html#from-rpy2-to-numpy )
Best,
Laurent
On 3/20/10 5:00 PM, biopython-request at lists.open-bio.org wrote:
>> > Is there a quantile normalization method in biopython, I search but did not
>> > find. If not it looks straight forward would it be of any interest to the
>> > community for me to contribute a method
>> >
>> > 1. given n arrays of length p, form X of dimension
>> > p ? n where each array is a column;
>> > 2. sort each column of X to give X sort ;
>> > 3. take the means across rows of X sort and assign this
>> > mean to each element in the row to get X sort ;
>> > 4. get X normalized by rearranging each column of
>> > X sort to have the same ordering as original X
>> >
>> > From
>> > A comparison of normalization methods for high
>> > density oligonucleotide array data based on
>> > variance and bias
>> > B. M. Bolstad 1,?, R. A. Irizarry 2, M. Astrand 3 and T. P. Speed 4, 5
>> > ?
>> >
> Hi,
>
> I don't think there is such a method available.
>
> I'm myself using the original R implementation by Bolstad et al. It requires
> rPy and R installed. It can be achieved in a few lines of code:
>
> <pre>
> import rpy2.robjects as robjects
> #ll = list of concatenated values to normalize
> v = robjects.FloatVector(ll)
> #numrows=number of vectors that made up ll
> m = robjects.r['matrix'](v, nrow = numrows, byrow=True)
> robjects.r('require("preprocessCore")')
> normq=robjects.r('normalize.quantiles')
> norm_a=numpy.array(normq(m))
> #norm_a=normalized array
> </pre>
>
> If your method is a pure python implementation which is comparably fast I
> think it would be worth to have it in Biopython since the method is (in my
> opinion) quite useful and it would remove the dependency on R from some of
> my scripts.
>
> cheers
> Bartek
>
More information about the Biopython
mailing list