主成分分析

From Ufldl

Jump to: navigation, search
m
m
Line 61: Line 61:
[[File:PCA-rawdata.png|600px]]
[[File:PCA-rawdata.png|600px]]
-
【原文】:
+
【原文】:This data has already been pre-processed so that each of the features <math>\textstyle x_1</math> and <math>\textstyle x_2</math>
-
This data has already been pre-processed so that each of the features <math>\textstyle x_1</math> and <math>\textstyle x_2</math>
+
have about the same mean (zero) and variance.   
have about the same mean (zero) and variance.   
Line 74: Line 73:
【一审】:数据已经进行了预处理,所以特征值<math>\textstyle x_1</math>和<math>\textstyle x_2</math>有相近的平均值(零)和方差。
【一审】:数据已经进行了预处理,所以特征值<math>\textstyle x_1</math>和<math>\textstyle x_2</math>有相近的平均值(零)和方差。
为了更直观的区分,根据<math>\textstyle x_1</math>值的大小我们把它们分成了三种颜色。颜色的区分对于算法没有任何影响,仅仅是为了直观表示。
为了更直观的区分,根据<math>\textstyle x_1</math>值的大小我们把它们分成了三种颜色。颜色的区分对于算法没有任何影响,仅仅是为了直观表示。
-
 
【二审】:这些数据已经进行了预处理,所以每个特征值<math>\textstyle x_1</math>和<math>\textstyle x_2</math>有相同的平均值(零)和方差。
【二审】:这些数据已经进行了预处理,所以每个特征值<math>\textstyle x_1</math>和<math>\textstyle x_2</math>有相同的平均值(零)和方差。
为方便展示,根据<math>\textstyle x_1</math>值的大小,我们将每个点分别涂上了三种颜色的一种。该颜色并不用于算法而仅用于图解。
为方便展示,根据<math>\textstyle x_1</math>值的大小,我们将每个点分别涂上了三种颜色的一种。该颜色并不用于算法而仅用于图解。
-
 
Line 84: Line 81:
From visually examining the data, it appears that <math>\textstyle u_1</math> is the principal direction of  
From visually examining the data, it appears that <math>\textstyle u_1</math> is the principal direction of  
variation of the data, and <math>\textstyle u_2</math> the secondary direction of variation:
variation of the data, and <math>\textstyle u_2</math> the secondary direction of variation:
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
[[File:PCA-u1.png | 600px]]
[[File:PCA-u1.png | 600px]]
Line 90: Line 94:
To more formally find the directions <math>\textstyle u_1</math> and <math>\textstyle u_2</math>, we first compute the matrix <math>\textstyle \Sigma</math>
To more formally find the directions <math>\textstyle u_1</math> and <math>\textstyle u_2</math>, we first compute the matrix <math>\textstyle \Sigma</math>
as follows:
as follows:
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
 +
:<math>\begin{align}
:<math>\begin{align}
\Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T.  
\Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T.  
Line 98: Line 110:
the top (principal) eigenvector of <math>\textstyle \Sigma</math>, and <math>\textstyle u_2</math> is
the top (principal) eigenvector of <math>\textstyle \Sigma</math>, and <math>\textstyle u_2</math> is
the second eigenvector.
the second eigenvector.
 +
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
Note: If you are interested in seeing a more formal mathematical derivation/justification of this result, see the CS229 (Machine Learning) lecture notes on PCA (link at bottom of this page).  You won't need to do so to follow along this course, however.   
Note: If you are interested in seeing a more formal mathematical derivation/justification of this result, see the CS229 (Machine Learning) lecture notes on PCA (link at bottom of this page).  You won't need to do so to follow along this course, however.   
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
You can use standard numerical linear algebra software to find these eigenvectors (see Implementation Notes).
You can use standard numerical linear algebra software to find these eigenvectors (see Implementation Notes).
Concretely, let us compute the eigenvectors of <math>\textstyle \Sigma</math>, and stack
Concretely, let us compute the eigenvectors of <math>\textstyle \Sigma</math>, and stack
the eigenvectors in columns to form the matrix <math>\textstyle U</math>:
the eigenvectors in columns to form the matrix <math>\textstyle U</math>:
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
 +
:<math>\begin{align}
:<math>\begin{align}
U =  
U =  
Line 115: Line 150:
<math>\textstyle u_2</math> is the second eigenvector, and so on.  
<math>\textstyle u_2</math> is the second eigenvector, and so on.  
Also, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the corresponding eigenvalues.  
Also, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the corresponding eigenvalues.  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
The vectors <math>\textstyle u_1</math> and <math>\textstyle u_2</math> in our example form a new basis in which we  
The vectors <math>\textstyle u_1</math> and <math>\textstyle u_2</math> in our example form a new basis in which we  
Line 121: Line 162:
Similarly, <math>\textstyle u_2^Tx</math> is the magnitude of <math>\textstyle x</math> projected onto the vector <math>\textstyle u_2</math>.
Similarly, <math>\textstyle u_2^Tx</math> is the magnitude of <math>\textstyle x</math> projected onto the vector <math>\textstyle u_2</math>.
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
== Rotating the Data ==
== Rotating the Data ==
Line 133: Line 180:
<math>\textstyle x_{\rm rot}^{(i)} = U^Tx^{(i)}</math> for every <math>\textstyle i</math>.  Plotting this transformed data  
<math>\textstyle x_{\rm rot}^{(i)} = U^Tx^{(i)}</math> for every <math>\textstyle i</math>.  Plotting this transformed data  
<math>\textstyle x_{\rm rot}</math>, we get:  
<math>\textstyle x_{\rm rot}</math>, we get:  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
[[File:PCA-rotated.png|600px]]
[[File:PCA-rotated.png|600px]]
Line 148: Line 202:
\end{align}</math>
\end{align}</math>
because <math>\textstyle U x_{\rm rot} =  UU^T x = x</math>.
because <math>\textstyle U x_{\rm rot} =  UU^T x = x</math>.
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
== Reducing the Data Dimension ==
== Reducing the Data Dimension ==
Line 154: Line 214:
dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data.  Thus, if we want to
dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data.  Thus, if we want to
reduce this data to one dimension, we can set  
reduce this data to one dimension, we can set  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
:<math>\begin{align}
:<math>\begin{align}
\tilde{x}^{(i)} = x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)} \in \Re.
\tilde{x}^{(i)} = x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)} \in \Re.
Line 161: Line 228:
take the first <math>\textstyle k</math> components of <math>\textstyle x_{\rm rot}</math>, which correspond to
take the first <math>\textstyle k</math> components of <math>\textstyle x_{\rm rot}</math>, which correspond to
the top <math>\textstyle k</math> directions of variation.  
the top <math>\textstyle k</math> directions of variation.  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
Another way of explaining PCA is that <math>\textstyle x_{\rm rot}</math> is an <math>\textstyle n</math> dimensional
Another way of explaining PCA is that <math>\textstyle x_{\rm rot}</math> is an <math>\textstyle n</math> dimensional
Line 167: Line 240:
reasonably large values for most examples <math>\textstyle i</math>), and
reasonably large values for most examples <math>\textstyle i</math>), and
the later components are likely to be small (e.g., in our example,  
the later components are likely to be small (e.g., in our example,  
-
<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> was more likely to be small).  What
+
<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> was more likely to be small).   
 +
 
 +
【初译】:
 +
 
 +
【一审】:
 +
 
 +
【二审】:
 +
 
 +
What
PCA does it it  
PCA does it it  
drops the the later (smaller) components of <math>\textstyle x_{\rm rot}</math>, and
drops the the later (smaller) components of <math>\textstyle x_{\rm rot}</math>, and
Line 197: Line 278:
\end{align}</math>
\end{align}</math>
In our example, this gives us the following plot of <math>\textstyle \tilde{x}</math> (using <math>\textstyle n=2, k=1</math>):
In our example, this gives us the following plot of <math>\textstyle \tilde{x}</math> (using <math>\textstyle n=2, k=1</math>):
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
[[File:PCA-xtilde.png | 600px]]
[[File:PCA-xtilde.png | 600px]]
Line 203: Line 290:
always be zero, there is no need to keep these zeros around, and so we
always be zero, there is no need to keep these zeros around, and so we
define <math>\textstyle \tilde{x}</math> as a <math>\textstyle k</math>-dimensional vector with just the first <math>\textstyle k</math> (non-zero) components.  
define <math>\textstyle \tilde{x}</math> as a <math>\textstyle k</math>-dimensional vector with just the first <math>\textstyle k</math> (non-zero) components.  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
This also explains why we wanted to express our data in the <math>\textstyle u_1, u_2, \ldots, u_n</math> basis:
This also explains why we wanted to express our data in the <math>\textstyle u_1, u_2, \ldots, u_n</math> basis:
Deciding which components to keep becomes just keeping the top <math>\textstyle k</math> components.  When we
Deciding which components to keep becomes just keeping the top <math>\textstyle k</math> components.  When we
do this, we also say that we are "retaining the top <math>\textstyle k</math> PCA (or principal) components."
do this, we also say that we are "retaining the top <math>\textstyle k</math> PCA (or principal) components."
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
== Recovering an Approximation of the Data ==
== Recovering an Approximation of the Data ==
Line 217: Line 316:
pad it out with <math>\textstyle n-k</math> zeros to get our approximation to <math>\textstyle x_{\rm rot} \in \Re^n</math>.  Finally, we pre-multiply
pad it out with <math>\textstyle n-k</math> zeros to get our approximation to <math>\textstyle x_{\rm rot} \in \Re^n</math>.  Finally, we pre-multiply
by <math>\textstyle U</math> to get our approximation to <math>\textstyle x</math>.  Concretely, we get  
by <math>\textstyle U</math> to get our approximation to <math>\textstyle x</math>.  Concretely, we get  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
:<math>\begin{align}
:<math>\begin{align}
\hat{x}  = U \begin{bmatrix} \tilde{x}_1 \\ \vdots \\ \tilde{x}_k \\ 0 \\ \vdots \\ 0 \end{bmatrix}   
\hat{x}  = U \begin{bmatrix} \tilde{x}_1 \\ \vdots \\ \tilde{x}_k \\ 0 \\ \vdots \\ 0 \end{bmatrix}   
Line 226: Line 332:
multiply <math>\textstyle \tilde{x} \in \Re^k</math> with the first <math>\textstyle k</math> columns of <math>\textstyle U</math> as in the final expression above.)
multiply <math>\textstyle \tilde{x} \in \Re^k</math> with the first <math>\textstyle k</math> columns of <math>\textstyle U</math> as in the final expression above.)
Applying this to our dataset, we get the following plot for <math>\textstyle \hat{x}</math>:
Applying this to our dataset, we get the following plot for <math>\textstyle \hat{x}</math>:
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
[[File:PCA-xhat.png | 600px]]
[[File:PCA-xhat.png | 600px]]
Line 238: Line 350:
to the original, and using PCA this way can significantly speed up your algorithm while
to the original, and using PCA this way can significantly speed up your algorithm while
introducing very little approximation error.
introducing very little approximation error.
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
== Number of components to retain ==
== Number of components to retain ==
Line 248: Line 366:
Conversely, if <math>\textstyle k</math> is too small, then we might be using a very bad
Conversely, if <math>\textstyle k</math> is too small, then we might be using a very bad
approximation to the data.  
approximation to the data.  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance retained'''  
To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance retained'''  
Line 255: Line 380:
Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,
Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,
and thus 0% of the variance is retained.  
and thus 0% of the variance is retained.  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues  
More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues  
Line 260: Line 391:
corresponding to the eigenvector <math>\textstyle u_j</math>.  Then if we retain <math>\textstyle k</math> principal components,  
corresponding to the eigenvector <math>\textstyle u_j</math>.  Then if we retain <math>\textstyle k</math> principal components,  
the percentage of variance retained is given by:
the percentage of variance retained is given by:
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
 +
:<math>\begin{align}
:<math>\begin{align}
\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}.
\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}.
Line 266: Line 404:
by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,
by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,
or 91.3% of the variance.
or 91.3% of the variance.
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
A more formal definition of percentage of variance retained is beyond the scope
A more formal definition of percentage of variance retained is beyond the scope
Line 276: Line 420:
<math>\textstyle x_{{\rm rot},j}</math> are the ones that're more variable and that take on larger values,  
<math>\textstyle x_{{\rm rot},j}</math> are the ones that're more variable and that take on larger values,  
and for which we would incur a greater approximation error if we were to set them to zero.  
and for which we would incur a greater approximation error if we were to set them to zero.  
 +
 +
【初译】:
 +
 +
【一审】:
 +
 +
【二审】:
In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99% of
In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99% of

Revision as of 20:06, 11 March 2013

Personal tools