主成分分析

From Ufldl

Jump to: navigation, search
m
Line 8: Line 8:
-
== Introduction ==
+
== Introduction 引言 ==
-
引言
 
【原文】:Principal Components Analysis (PCA) is a dimensionality reduction algorithm
【原文】:Principal Components Analysis (PCA) is a dimensionality reduction algorithm
Line 40: Line 39:
-
== Example and Mathematical Background ==
+
== Example and Mathematical Background 实例和数学背景 ==
-
实例和数学背景
 
【原文】:For our running example, we will use a dataset  
【原文】:For our running example, we will use a dataset  
Line 60: Line 58:
[[File:PCA-rawdata.png|600px]]
[[File:PCA-rawdata.png|600px]]
 +
【原文】:This data has already been pre-processed so that each of the features <math>\textstyle x_1</math> and <math>\textstyle x_2</math>
【原文】:This data has already been pre-processed so that each of the features <math>\textstyle x_1</math> and <math>\textstyle x_2</math>
Line 90: Line 89:
[[File:PCA-u1.png | 600px]]
[[File:PCA-u1.png | 600px]]
 +
【原文】:I.e., the data varies much more in the direction <math>\textstyle u_1</math> than <math>\textstyle u_2</math>.  
【原文】:I.e., the data varies much more in the direction <math>\textstyle u_1</math> than <math>\textstyle u_2</math>.  
Line 105: Line 105:
\Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T.  
\Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T.  
\end{align}</math>
\end{align}</math>
 +
【原文】:If <math>\textstyle x</math> has zero mean, then <math>\textstyle \Sigma</math> is exactly the covariance matrix of <math>\textstyle x</math>.  (The symbol "<math>\textstyle \Sigma</math>", pronounced "Sigma", is the standard notation for denoting the covariance matrix.  Unfortunately it looks just like the summation symbol, as in <math>\sum_{i=1}^n i</math>; but these are two different things.)  
【原文】:If <math>\textstyle x</math> has zero mean, then <math>\textstyle \Sigma</math> is exactly the covariance matrix of <math>\textstyle x</math>.  (The symbol "<math>\textstyle \Sigma</math>", pronounced "Sigma", is the standard notation for denoting the covariance matrix.  Unfortunately it looks just like the summation symbol, as in <math>\sum_{i=1}^n i</math>; but these are two different things.)  
Line 111: Line 112:
the top (principal) eigenvector of <math>\textstyle \Sigma</math>, and <math>\textstyle u_2</math> is
the top (principal) eigenvector of <math>\textstyle \Sigma</math>, and <math>\textstyle u_2</math> is
the second eigenvector.
the second eigenvector.
-
 
【初译】:假设<math>\textstyle x</math>的均值为零,那么<math>\textstyle \Sigma</math>就是<math>\textstyle x</math>的协方差矩阵。(符号<math>\textstyle \Sigma</math>,读"Sigma",是协方差矩阵的表示符。虽然看起来与求和符号<math>\sum_{i=1}^n i</math>比较像,但他们是两个不同的概念。)由此可以得出,数据变化的主方向<math>\textstyle u_1</math>是协方差矩阵<math>\textstyle \Sigma</math>的主特征向量,而<math>\textstyle u_2</math>是次特征向量。
【初译】:假设<math>\textstyle x</math>的均值为零,那么<math>\textstyle \Sigma</math>就是<math>\textstyle x</math>的协方差矩阵。(符号<math>\textstyle \Sigma</math>,读"Sigma",是协方差矩阵的表示符。虽然看起来与求和符号<math>\sum_{i=1}^n i</math>比较像,但他们是两个不同的概念。)由此可以得出,数据变化的主方向<math>\textstyle u_1</math>是协方差矩阵<math>\textstyle \Sigma</math>的主特征向量,而<math>\textstyle u_2</math>是次特征向量。
Line 148: Line 148:
\end{bmatrix}
\end{bmatrix}
\end{align}</math>
\end{align}</math>
 +
【原文】:Here, <math>\textstyle u_1</math> is the principal eigenvector (corresponding to the largest eigenvalue),
【原文】:Here, <math>\textstyle u_1</math> is the principal eigenvector (corresponding to the largest eigenvalue),
Line 173: Line 174:
-
== Rotating the Data ==
+
== Rotating the Data 旋转数据 ==
 +
 
【原文】:Thus, we can represent <math>\textstyle x</math> in the <math>\textstyle (u_1, u_2)</math>-basis by computing
【原文】:Thus, we can represent <math>\textstyle x</math> in the <math>\textstyle (u_1, u_2)</math>-basis by computing
Line 198: Line 200:
[[File:PCA-rotated.png|600px]]
[[File:PCA-rotated.png|600px]]
 +
【原文】:This is the training set rotated into the <math>\textstyle u_1</math>,<math>\textstyle u_2</math> basis. In the general
【原文】:This is the training set rotated into the <math>\textstyle u_1</math>,<math>\textstyle u_2</math> basis. In the general
Line 226: Line 229:
-
== Reducing the Data Dimension ==
+
== Reducing the Data Dimension 数据降维 ==
 +
 
【原文】:We see that the principal direction of variation of the data is the first
【原文】:We see that the principal direction of variation of the data is the first
Line 241: Line 245:
\tilde{x}^{(i)} = x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)} \in \Re.
\tilde{x}^{(i)} = x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)} \in \Re.
\end{align}</math>
\end{align}</math>
 +
【原文】:More generally, if <math>\textstyle x \in \Re^n</math> and we want to reduce it to  
【原文】:More generally, if <math>\textstyle x \in \Re^n</math> and we want to reduce it to  
Line 355: Line 360:
-
== Recovering an Approximation of the Data ==
+
== Recovering an Approximation of the Data 数据还原 ==
 +
 
【原文】:Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation
【原文】:Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation
Line 375: Line 381:
= \sum_{i=1}^k u_i \tilde{x}_i.
= \sum_{i=1}^k u_i \tilde{x}_i.
\end{align}</math>
\end{align}</math>
 +
【原文】:The final equality above comes from the definition of <math>\textstyle U</math> [[#Example and Mathematical Background|given earlier]].
【原文】:The final equality above comes from the definition of <math>\textstyle U</math> [[#Example and Mathematical Background|given earlier]].
Line 410: Line 417:
-
== Number of components to retain ==
+
== Number of components to retain 选择主成分个数 ==
 +
 
【原文】:How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain?  In our
【原文】:How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain?  In our
Line 507: Line 515:
【二审】:
【二审】:
-
== PCA on Images ==
+
 
 +
== PCA on Images 对图像数据应用PCA算法 ==
 +
 
 +
 
【原文】:For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
【原文】:For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
to have a similar range of values to the others (and to have a mean close to
to have a similar range of values to the others (and to have a mean close to
Line 617: Line 628:
【二审】:
【二审】:
-
== References ==
+
== References 参考文献 ==
http://cs229.stanford.edu
http://cs229.stanford.edu
{{PCA}}
{{PCA}}

Revision as of 21:25, 11 March 2013

Personal tools