Implementing PCA/Whitening

From Ufldl

Jump to: navigation, search
Line 2: Line 2:
and also describe how you can implement them using efficient linear algebra libraries.
and also describe how you can implement them using efficient linear algebra libraries.
-
First, we need to ensure that the data has zero-mean, that is <math>  \frac{1}{m} \sum_{i=1}^m x^{(i)} = 0 </math>. We achieve this by first centering the dataset, such that it has zero-mean on expectation. In Matlab, we can do this by using
+
First, we need to ensure that the data has (approximately) zero-mean. For natural images, we achieve this (approximately) by subtracting the mean value of each image patch.
-
  avg = mean(x, 2);
+
We achieve this by computing the mean for each patch and subtracting it for each patch. In Matlab, we can do this by using
-
  x = x - repmat(avg, 1, size(x, 2));
+
 
 +
  avg = mean(x, 1);
 +
  x = x - repmat(avg, size(x, 1), 1);
Next, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>.  If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can instead compute this in one fell swoop as  
Next, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>.  If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can instead compute this in one fell swoop as  
Line 24: Line 26:
Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows:
Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows:
-
  xRot = U(:,1:k)' * x;   % k is number of eigenvectors to keep  
+
  xRot = U(:,1:k)' * x;     % k is number of eigenvectors to keep  
-
  xTilde = U(:,1:k)' * x; % which corresponds to the # dimensions after reduction
+
  xTilde = U(:,1:k) * xRot; % which corresponds to the # dimensions after reduction
-
                        % set k = size(x, 1) to keep all the eigenvectors
+
                          % set k = size(x, 1) to keep all the eigenvectors
This gives your PCA representation of the data in terms of <math>\textstyle \tilde{x} \in \Re^k</math>.  
This gives your PCA representation of the data in terms of <math>\textstyle \tilde{x} \in \Re^k</math>.  

Revision as of 05:28, 29 April 2011

Personal tools