主成分分析
From Ufldl
Line 1: | Line 1: | ||
主成分分析 | 主成分分析 | ||
- | + | == Introduction == | |
- | + | ||
- | + | ||
Principal Components Analysis (PCA) is a dimensionality reduction algorithm | Principal Components Analysis (PCA) is a dimensionality reduction algorithm | ||
that can be used to significantly speed up your unsupervised feature learning | that can be used to significantly speed up your unsupervised feature learning | ||
Line 17: | Line 15: | ||
a much lower dimensional one, while incurring very little error. | a much lower dimensional one, while incurring very little error. | ||
- | + | == Example and Mathematical Background == | |
For our running example, we will use a dataset | For our running example, we will use a dataset | ||
Line 78: | Line 76: | ||
Similarly, <math>\textstyle u_2^Tx</math> is the magnitude of <math>\textstyle x</math> projected onto the vector <math>\textstyle u_2</math>. | Similarly, <math>\textstyle u_2^Tx</math> is the magnitude of <math>\textstyle x</math> projected onto the vector <math>\textstyle u_2</math>. | ||
- | + | == Rotating the Data == | |
Thus, we can represent <math>\textstyle x</math> in the <math>\textstyle (u_1, u_2)</math>-basis by computing | Thus, we can represent <math>\textstyle x</math> in the <math>\textstyle (u_1, u_2)</math>-basis by computing | ||
Line 105: | Line 103: | ||
because <math>\textstyle U x_{\rm rot} = UU^T x = x</math>. | because <math>\textstyle U x_{\rm rot} = UU^T x = x</math>. | ||
- | + | == Reducing the Data Dimension == | |
We see that the principal direction of variation of the data is the first | We see that the principal direction of variation of the data is the first | ||
Line 164: | Line 162: | ||
do this, we also say that we are "retaining the top <math>\textstyle k</math> PCA (or principal) components." | do this, we also say that we are "retaining the top <math>\textstyle k</math> PCA (or principal) components." | ||
- | + | == Recovering an Approximation of the Data == | |
Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation | Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation | ||
Line 195: | Line 193: | ||
introducing very little approximation error. | introducing very little approximation error. | ||
- | + | == Number of components to retain == | |
How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain? In our | How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain? In our | ||
Line 244: | Line 242: | ||
that you retained 120 (or whatever other number of) components. | that you retained 120 (or whatever other number of) components. | ||
- | + | == PCA on Images == | |
For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math> | For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math> | ||
to have a similar range of values to the others (and to have a mean close to | to have a similar range of values to the others (and to have a mean close to |