主成分分析

From Ufldl

Jump to: navigation, search
Line 1: Line 1:
-
主成分分析
+
初译:@交大基层代表 @Emma_lzhang
 +
 
 +
一审:@Dr金峰
 +
 
 +
二审:@破破的桥
 +
 
 +
录入:@Emma_lzhang
 +
 
 +
 
== Introduction ==
== Introduction ==
-
Principal Components Analysis (PCA) is a dimensionality reduction algorithm
+
 
 +
【原文】:Principal Components Analysis (PCA) is a dimensionality reduction algorithm
that can be used to significantly speed up your unsupervised feature learning
that can be used to significantly speed up your unsupervised feature learning
algorithm.  More importantly, understanding PCA will enable us to later
algorithm.  More importantly, understanding PCA will enable us to later
Line 17: Line 26:
== Example and Mathematical Background ==
== Example and Mathematical Background ==
-
For our running example, we will use a dataset  
+
【原文】:For our running example, we will use a dataset  
<math>\textstyle \{x^{(1)}, x^{(2)}, \ldots, x^{(m)}\}</math> with  
<math>\textstyle \{x^{(1)}, x^{(2)}, \ldots, x^{(m)}\}</math> with  
<math>\textstyle n=2</math> dimensional inputs, so that  
<math>\textstyle n=2</math> dimensional inputs, so that  
Line 78: Line 87:
== Rotating the Data ==
== Rotating the Data ==
-
Thus, we can represent <math>\textstyle x</math> in the <math>\textstyle (u_1, u_2)</math>-basis by computing
+
【原文】:Thus, we can represent <math>\textstyle x</math> in the <math>\textstyle (u_1, u_2)</math>-basis by computing
:<math>\begin{align}
:<math>\begin{align}
x_{\rm rot} = U^Tx = \begin{bmatrix} u_1^Tx \\ u_2^Tx \end{bmatrix}  
x_{\rm rot} = U^Tx = \begin{bmatrix} u_1^Tx \\ u_2^Tx \end{bmatrix}  
Line 105: Line 114:
== Reducing the Data Dimension ==
== Reducing the Data Dimension ==
-
We see that the principal direction of variation of the data is the first
+
【原文】:We see that the principal direction of variation of the data is the first
dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data.  Thus, if we want to
dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data.  Thus, if we want to
reduce this data to one dimension, we can set  
reduce this data to one dimension, we can set  
Line 164: Line 173:
== Recovering an Approximation of the Data ==
== Recovering an Approximation of the Data ==
-
Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation
+
【原文】:Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation
of the original <math>\textstyle x \in \Re^n</math>.  Given <math>\textstyle \tilde{x}</math>, how can we recover an approximation <math>\textstyle \hat{x}</math> to  
of the original <math>\textstyle x \in \Re^n</math>.  Given <math>\textstyle \tilde{x}</math>, how can we recover an approximation <math>\textstyle \hat{x}</math> to  
the original value of <math>\textstyle x</math>?  From an [[#Rotating the Data|earlier section]], we know that <math>\textstyle x = U x_{\rm rot}</math>.  Further,  
the original value of <math>\textstyle x</math>?  From an [[#Rotating the Data|earlier section]], we know that <math>\textstyle x = U x_{\rm rot}</math>.  Further,  
Line 195: Line 204:
== Number of components to retain ==
== Number of components to retain ==
-
How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain?  In our
+
【原文】:How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain?  In our
simple 2 dimensional example, it seemed natural to retain 1 out of the 2
simple 2 dimensional example, it seemed natural to retain 1 out of the 2
components, but for higher dimensional data, this decision is less trivial.  If <math>\textstyle k</math> is
components, but for higher dimensional data, this decision is less trivial.  If <math>\textstyle k</math> is
Line 243: Line 252:
== PCA on Images ==
== PCA on Images ==
-
For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
+
【原文】:For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
to have a similar range of values to the others (and to have a mean close to
to have a similar range of values to the others (and to have a mean close to
zero).  If you've used PCA on other applications before, you may therefore have
zero).  If you've used PCA on other applications before, you may therefore have

Revision as of 04:11, 11 March 2013

Personal tools