PCA

Revision as of 05:30, 2 April 2011 (view source)

Watsuen (Talk | contribs)

(→Recovering an Approximation of the Data)

← Older edit

Revision as of 11:01, 4 April 2011 (view source)

10.31.228.181 (Talk)

Newer edit →

Line 190:

to the original, and using PCA this way can significantly speed up your algorithm while

introducing very little approximation error.

+

== Number of components to retain ==

+

How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain? In our

+

simple 2 dimensional example, it seemed natural to retain 1 out of the 2

+

components, but for higher dimensional data, this decision is less trivial. If <math>\textstyle k</math> is

+

too large, then we won't be compressing the data much; in the limit of <math>\textstyle k=n</math>,

+

then we're just using the original data (but rotated into a different basis).

+

Conversely, if <math>\textstyle k</math> is too small, then we might be using a very bad

+

approximation to the data.

+

To decide how to set <math>\textstyle k</math>, we will usually look at the {\bf percentage of variance

+

retained} for different values of <math>\textstyle k</math>. Concretely, if <math>\textstyle k=n</math>, then we have

+

an exact approximation to the data, and we say that 100\% of the variance is

+

retained. I.e., all of the variation of the original data is retained.

+

Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,

+

and thus 0\% of the variance is retained.

+

More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues

+

of <math>\textstyle \Sigma</math> (sorted in decreasing order), so that <math>\textstyle \lambda_j</math> is the eigenvalue

+

corresponding to the eigenvector <math>\textstyle u_j</math>. Then if we retain <math>\textstyle k</math> principal components,

+

the percentage of variance retained is given by:

+

:<math>\begin{align}

+

\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}.

+

\end{align}</math>

+

In our simple 2D example above, <math>\textstyle \lambda_1 = 7.29</math>, and <math>\textstyle \lambda_2 = 0.69</math>. Thus,

+

by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,

+

or 91.3\% of the variance.

+

A more formal definition of percentage of variance retained is beyond the scope

+

of these notes. However, it is possible to show that <math>\textstyle \lambda_j =

+

\sum_{i=1}^m x_{{\rm rot},j}^2</math>. Thus, if <math>\textstyle \lambda_j \approx 0</math>, that shows that

+

<math>\textstyle x_{{\rm rot},j}</math> is usually near 0 anyway, and we lose relatively little by

+

approximating it with a constant 0. This also explains why we retain the top principal

+

components (corresponding to the larger values of <math>\textstyle \lambda_j</math>) instead of the bottom

+

ones. The top principal components

+

<math>\textstyle x_{{\rm rot},j}</math> are the ones that're more variable and that take on larger values,

+

and for which we would incur a greater approximation error if we were to set them to zero.

+

In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99\% of

+

the variance. In other words, we pick the smallest value of <math>\textstyle k</math> that satisfies

+

:<math>\begin{align}

+

\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j} \geq 0.99.

+

\end{align}</math>

+

Depending on the application, if you are willing to incur some

+

additional error, values in the 90-98\% range are also sometimes used. When you

+

describe to others how you applied PCA, saying that you chose <math>\textstyle k</math> to retain 95\% of

+

the variance will also be a much more easily interpretable description than saying

+

that you retained 120 (or whatever other number of) components.

== References ==

http://cs229.stanford.edu

From Ufldl

Revision as of 11:01, 4 April 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 190: / Line 190: @@
 to the original, and using PCA this way can significantly speed up your algorithm while
 introducing very little approximation error.
+== Number of components to retain ==
+How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain?  In our
+simple 2 dimensional example, it seemed natural to retain 1 out of the 2
+components, but for higher dimensional data, this decision is less trivial.  If <math>\textstyle k</math> is
+too large, then we won't be compressing the data much; in the limit of <math>\textstyle k=n</math>,
+then we're just using the original data (but rotated into a different basis).
+Conversely, if <math>\textstyle k</math> is too small, then we might be using a very bad
+approximation to the data.
+To decide how to set <math>\textstyle k</math>, we will usually look at the {\bf percentage of variance
+retained} for different values of <math>\textstyle k</math>.  Concretely, if <math>\textstyle k=n</math>, then we have
+an exact approximation to the data, and we say that 100\% of the variance is
+retained.  I.e., all of the variation of the original data is retained.
+Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,
+and thus 0\% of the variance is retained.
+More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues
+of <math>\textstyle \Sigma</math> (sorted in decreasing order), so that <math>\textstyle \lambda_j</math> is the eigenvalue
+corresponding to the eigenvector <math>\textstyle u_j</math>.  Then if we retain <math>\textstyle k</math> principal components,
+the percentage of variance retained is given by:
+:<math>\begin{align}
+\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}.
+\end{align}</math>
+In our simple 2D example above, <math>\textstyle \lambda_1 = 7.29</math>, and <math>\textstyle \lambda_2 = 0.69</math>.  Thus,
+by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,
+or 91.3\% of the variance.
+A more formal definition of percentage of variance retained is beyond the scope
+of these notes.  However, it is possible to show that <math>\textstyle \lambda_j =
+\sum_{i=1}^m x_{{\rm rot},j}^2</math>.  Thus, if <math>\textstyle \lambda_j \approx 0</math>, that shows that
+<math>\textstyle x_{{\rm rot},j}</math> is usually near 0 anyway, and we lose relatively little by
+approximating it with a constant 0.  This also explains why we retain the top principal
+components (corresponding to the larger values of <math>\textstyle \lambda_j</math>) instead of the bottom
+ones.  The top principal components
+<math>\textstyle x_{{\rm rot},j}</math> are the ones that're more variable and that take on larger values,
+and for which we would incur a greater approximation error if we were to set them to zero.
+In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99\% of
+the variance.  In other words, we pick the smallest value of <math>\textstyle k</math> that satisfies
+:<math>\begin{align}
+\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j} \geq 0.99.
+\end{align}</math>
+Depending on the application, if you are willing to incur some
+additional error, values in the 90-98\% range are also sometimes used.  When you
+describe to others how you applied PCA, saying that you chose <math>\textstyle k</math> to retain 95\% of
+the variance will also be a much more easily interpretable description than saying
+that you retained 120 (or whatever other number of) components.
 == References ==
 http://cs229.stanford.edu