Implementing PCA/Whitening
From Ufldl
m (sigma bug) |
|||
Line 4: | Line 4: | ||
First, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can instead compute this in one fell swoop as | First, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can instead compute this in one fell swoop as | ||
- | + | sigma = x * x' / size(x, 2); | |
(Check the math yourself for correctness.) | (Check the math yourself for correctness.) | ||
Here, we assume that <math>x</math> is a data structure that contains one training example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix). | Here, we assume that <math>x</math> is a data structure that contains one training example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix). | ||
- | Next, PCA computes the eigenvectors of | + | Next, PCA computes the eigenvectors of <math>\Sigma</math>. One could do this using the Matlab <tt>eig</tt> function. However, because <math>\Sigma</math> is a symmetric positive semi-definite matrix, it is more numerically reliable to do this using the <tt>svd</tt> function. Concretely, if you implement |
- | [U,S,V] = svd( | + | [U,S,V] = svd(sigma); |
then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column, sorted in order from top to bottom eigenvector), and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues (also sorted in decreasing order). The matrix <math>V</math> will be equal to transpose of <math>U</math>, and can be safely ignored. | then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column, sorted in order from top to bottom eigenvector), and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues (also sorted in decreasing order). The matrix <math>V</math> will be equal to transpose of <math>U</math>, and can be safely ignored. | ||
- | (Note: The < | + | (Note: The <tt>svd</tt> function actually computes the singular vectors and singular values of a matrix, which for the special case of a symmetric positive semi-definite matrix---which is all that we're concerned with here---is equal to its eigenvectors and eigenvalues. A full discussion of singular vectors vs. eigenvectors is beyond the scope of these notes.) |
Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows: | Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows: | ||
Line 25: | Line 25: | ||
Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized | Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized | ||
implementation, and the expressions | implementation, and the expressions | ||
- | above work too for computing <math> | + | above work too for computing <math>x_{rot}</math> and <math>\tilde{x}</math> for your entire training set |
all in one go. The resulting | all in one go. The resulting | ||
- | <math>xrot</math> and <math> | + | <math>xrot</math> and <math>\tilde{x}</math> will have one column corresponding to each training example. |
To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use | To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use | ||
- | + | ||
- | xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x; | + | xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x; |
- | + | ||
Since <math>S</math>'s diagonal contains the eigenvalues <math>\textstyle \lambda_i</math>, | Since <math>S</math>'s diagonal contains the eigenvalues <math>\textstyle \lambda_i</math>, | ||
this turns out to be a compact way | this turns out to be a compact way |