主成分分析
From Ufldl
Line 1: | Line 1: | ||
- | + | 初译:@交大基层代表 @Emma_lzhang | |
+ | |||
+ | 一审:@Dr金峰 | ||
+ | |||
+ | 二审:@破破的桥 | ||
+ | |||
+ | 录入:@Emma_lzhang | ||
+ | |||
+ | |||
== Introduction == | == Introduction == | ||
- | + | ||
+ | 【原文】:Principal Components Analysis (PCA) is a dimensionality reduction algorithm | ||
that can be used to significantly speed up your unsupervised feature learning | that can be used to significantly speed up your unsupervised feature learning | ||
algorithm. More importantly, understanding PCA will enable us to later | algorithm. More importantly, understanding PCA will enable us to later | ||
Line 17: | Line 26: | ||
== Example and Mathematical Background == | == Example and Mathematical Background == | ||
- | + | 【原文】:For our running example, we will use a dataset | |
<math>\textstyle \{x^{(1)}, x^{(2)}, \ldots, x^{(m)}\}</math> with | <math>\textstyle \{x^{(1)}, x^{(2)}, \ldots, x^{(m)}\}</math> with | ||
<math>\textstyle n=2</math> dimensional inputs, so that | <math>\textstyle n=2</math> dimensional inputs, so that | ||
Line 78: | Line 87: | ||
== Rotating the Data == | == Rotating the Data == | ||
- | + | 【原文】:Thus, we can represent <math>\textstyle x</math> in the <math>\textstyle (u_1, u_2)</math>-basis by computing | |
:<math>\begin{align} | :<math>\begin{align} | ||
x_{\rm rot} = U^Tx = \begin{bmatrix} u_1^Tx \\ u_2^Tx \end{bmatrix} | x_{\rm rot} = U^Tx = \begin{bmatrix} u_1^Tx \\ u_2^Tx \end{bmatrix} | ||
Line 105: | Line 114: | ||
== Reducing the Data Dimension == | == Reducing the Data Dimension == | ||
- | + | 【原文】:We see that the principal direction of variation of the data is the first | |
dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data. Thus, if we want to | dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data. Thus, if we want to | ||
reduce this data to one dimension, we can set | reduce this data to one dimension, we can set | ||
Line 164: | Line 173: | ||
== Recovering an Approximation of the Data == | == Recovering an Approximation of the Data == | ||
- | + | 【原文】:Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation | |
of the original <math>\textstyle x \in \Re^n</math>. Given <math>\textstyle \tilde{x}</math>, how can we recover an approximation <math>\textstyle \hat{x}</math> to | of the original <math>\textstyle x \in \Re^n</math>. Given <math>\textstyle \tilde{x}</math>, how can we recover an approximation <math>\textstyle \hat{x}</math> to | ||
the original value of <math>\textstyle x</math>? From an [[#Rotating the Data|earlier section]], we know that <math>\textstyle x = U x_{\rm rot}</math>. Further, | the original value of <math>\textstyle x</math>? From an [[#Rotating the Data|earlier section]], we know that <math>\textstyle x = U x_{\rm rot}</math>. Further, | ||
Line 195: | Line 204: | ||
== Number of components to retain == | == Number of components to retain == | ||
- | + | 【原文】:How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain? In our | |
simple 2 dimensional example, it seemed natural to retain 1 out of the 2 | simple 2 dimensional example, it seemed natural to retain 1 out of the 2 | ||
components, but for higher dimensional data, this decision is less trivial. If <math>\textstyle k</math> is | components, but for higher dimensional data, this decision is less trivial. If <math>\textstyle k</math> is | ||
Line 243: | Line 252: | ||
== PCA on Images == | == PCA on Images == | ||
- | + | 【原文】:For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math> | |
to have a similar range of values to the others (and to have a mean close to | to have a similar range of values to the others (and to have a mean close to | ||
zero). If you've used PCA on other applications before, you may therefore have | zero). If you've used PCA on other applications before, you may therefore have |