Implementing PCA/Whitening
From Ufldl
Line 2: | Line 2: | ||
and also describe how you can implement them using efficient linear algebra libraries. | and also describe how you can implement them using efficient linear algebra libraries. | ||
- | First, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can instead compute this in one fell swoop as | + | First, we need to ensure that the data has zero-mean, that is <math> \frac{1}{m} \sum_{i=1}^m (x^{(i)}) = 0 </math>. We achieve this by first centering the dataset, such that it has zero-mean on expectation. In Matlab, we can do this by using |
+ | |||
+ | avg = mean(x, 2); | ||
+ | x = x - repmat(avg, 1, size(x, 2)); | ||
+ | |||
+ | we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can instead compute this in one fell swoop as | ||
sigma = x * x' / size(x, 2); | sigma = x * x' / size(x, 2); |