主成分分析

Revision as of 19:57, 11 March 2013 (view source)

Kandeng (Talk | contribs)

m

← Older edit

Revision as of 20:06, 11 March 2013 (view source)

Kandeng (Talk | contribs)

m

Newer edit →

Line 61:

[[File:PCA-rawdata.png|600px]]

-

~~【原文】：~~

+

【原文】：This data has already been pre-processed so that each of the features <math>\textstyle x_1</math> and <math>\textstyle x_2</math>

-

~~This~~ data has already been pre-processed so that each of the features <math>\textstyle x_1</math> and <math>\textstyle x_2</math>

+

have about the same mean (zero) and variance.

Line 74:

Line 73:

【一审】：数据已经进行了预处理，所以特征值<math>\textstyle x_1</math>和<math>\textstyle x_2</math>有相近的平均值（零）和方差。

为了更直观的区分，根据<math>\textstyle x_1</math>值的大小我们把它们分成了三种颜色。颜色的区分对于算法没有任何影响，仅仅是为了直观表示。

-

【二审】：这些数据已经进行了预处理，所以每个特征值<math>\textstyle x_1</math>和<math>\textstyle x_2</math>有相同的平均值（零）和方差。

为方便展示，根据<math>\textstyle x_1</math>值的大小，我们将每个点分别涂上了三种颜色的一种。该颜色并不用于算法而仅用于图解。

-

Line 84:

Line 81:

From visually examining the data, it appears that <math>\textstyle u_1</math> is the principal direction of

variation of the data, and <math>\textstyle u_2</math> the secondary direction of variation:

+

【初译】：

+

【一审】：

+

【二审】：

+

[[File:PCA-u1.png | 600px]]

Line 90:

Line 94:

To more formally find the directions <math>\textstyle u_1</math> and <math>\textstyle u_2</math>, we first compute the matrix <math>\textstyle \Sigma</math>

as follows:

+

【初译】：

+

【一审】：

+

【二审】：

+

:<math>\begin{align}

\Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T.

Line 98:

Line 110:

the top (principal) eigenvector of <math>\textstyle \Sigma</math>, and <math>\textstyle u_2</math> is

the second eigenvector.

+

【初译】：

+

【一审】：

+

【二审】：

+

Note: If you are interested in seeing a more formal mathematical derivation/justification of this result, see the CS229 (Machine Learning) lecture notes on PCA (link at bottom of this page). You won't need to do so to follow along this course, however.

+

【初译】：

+

【一审】：

+

【二审】：

+

You can use standard numerical linear algebra software to find these eigenvectors (see Implementation Notes).

Concretely, let us compute the eigenvectors of <math>\textstyle \Sigma</math>, and stack

the eigenvectors in columns to form the matrix <math>\textstyle U</math>:

+

【初译】：

+

【一审】：

+

【二审】：

+

:<math>\begin{align}

U =

Line 115:

Line 150:

<math>\textstyle u_2</math> is the second eigenvector, and so on.

Also, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the corresponding eigenvalues.

+

【初译】：

+

【一审】：

+

【二审】：

The vectors <math>\textstyle u_1</math> and <math>\textstyle u_2</math> in our example form a new basis in which we

Line 121:

Line 162:

Similarly, <math>\textstyle u_2^Tx</math> is the magnitude of <math>\textstyle x</math> projected onto the vector <math>\textstyle u_2</math>.

+

【初译】：

+

【一审】：

+

【二审】：

== Rotating the Data ==

Line 133:

Line 180:

<math>\textstyle x_{\rm rot}^{(i)} = U^Tx^{(i)}</math> for every <math>\textstyle i</math>. Plotting this transformed data

<math>\textstyle x_{\rm rot}</math>, we get:

+

【初译】：

+

【一审】：

+

【二审】：

+

[[File:PCA-rotated.png|600px]]

Line 148:

Line 202:

\end{align}</math>

because <math>\textstyle U x_{\rm rot} = UU^T x = x</math>.

+

【初译】：

+

【一审】：

+

【二审】：

== Reducing the Data Dimension ==

Line 154:

Line 214:

dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data. Thus, if we want to

reduce this data to one dimension, we can set

+

【初译】：

+

【一审】：

+

【二审】：

+

:<math>\begin{align}

\tilde{x}^{(i)} = x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)} \in \Re.

Line 161:

Line 228:

take the first <math>\textstyle k</math> components of <math>\textstyle x_{\rm rot}</math>, which correspond to

the top <math>\textstyle k</math> directions of variation.

+

【初译】：

+

【一审】：

+

【二审】：

Another way of explaining PCA is that <math>\textstyle x_{\rm rot}</math> is an <math>\textstyle n</math> dimensional

Line 167:

Line 240:

reasonably large values for most examples <math>\textstyle i</math>), and

the later components are likely to be small (e.g., in our example,

-

<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> was more likely to be small). What

+

<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> was more likely to be small).

+

【初译】：

+

【一审】：

+

【二审】：

+

What

PCA does it it

drops the the later (smaller) components of <math>\textstyle x_{\rm rot}</math>, and

Line 197:

Line 278:

\end{align}</math>

In our example, this gives us the following plot of <math>\textstyle \tilde{x}</math> (using <math>\textstyle n=2, k=1</math>):

+

【初译】：

+

【一审】：

+

【二审】：

[[File:PCA-xtilde.png | 600px]]

Line 203:

Line 290:

always be zero, there is no need to keep these zeros around, and so we

define <math>\textstyle \tilde{x}</math> as a <math>\textstyle k</math>-dimensional vector with just the first <math>\textstyle k</math> (non-zero) components.

+

【初译】：

+

【一审】：

+

【二审】：

This also explains why we wanted to express our data in the <math>\textstyle u_1, u_2, \ldots, u_n</math> basis:

Deciding which components to keep becomes just keeping the top <math>\textstyle k</math> components. When we

do this, we also say that we are "retaining the top <math>\textstyle k</math> PCA (or principal) components."

+

【初译】：

+

【一审】：

+

【二审】：

== Recovering an Approximation of the Data ==

Line 217:

Line 316:

pad it out with <math>\textstyle n-k</math> zeros to get our approximation to <math>\textstyle x_{\rm rot} \in \Re^n</math>. Finally, we pre-multiply

by <math>\textstyle U</math> to get our approximation to <math>\textstyle x</math>. Concretely, we get

+

【初译】：

+

【一审】：

+

【二审】：

+

:<math>\begin{align}

\hat{x} = U \begin{bmatrix} \tilde{x}_1 \\ \vdots \\ \tilde{x}_k \\ 0 \\ \vdots \\ 0 \end{bmatrix}

Line 226:

Line 332:

multiply <math>\textstyle \tilde{x} \in \Re^k</math> with the first <math>\textstyle k</math> columns of <math>\textstyle U</math> as in the final expression above.)

Applying this to our dataset, we get the following plot for <math>\textstyle \hat{x}</math>:

+

【初译】：

+

【一审】：

+

【二审】：

[[File:PCA-xhat.png | 600px]]

Line 238:

Line 350:

to the original, and using PCA this way can significantly speed up your algorithm while

introducing very little approximation error.

+

【初译】：

+

【一审】：

+

【二审】：

== Number of components to retain ==

Line 248:

Line 366:

Conversely, if <math>\textstyle k</math> is too small, then we might be using a very bad

approximation to the data.

+

【初译】：

+

【一审】：

+

【二审】：

+

To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance retained'''

Line 255:

Line 380:

Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,

and thus 0% of the variance is retained.

+

【初译】：

+

【一审】：

+

【二审】：

More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues

Line 260:

Line 391:

corresponding to the eigenvector <math>\textstyle u_j</math>. Then if we retain <math>\textstyle k</math> principal components,

the percentage of variance retained is given by:

+

【初译】：

+

【一审】：

+

【二审】：

+

:<math>\begin{align}

\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}.

Line 266:

Line 404:

by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,

or 91.3% of the variance.

+

【初译】：

+

【一审】：

+

【二审】：

A more formal definition of percentage of variance retained is beyond the scope

Line 276:

Line 420:

<math>\textstyle x_{{\rm rot},j}</math> are the ones that're more variable and that take on larger values,

and for which we would incur a greater approximation error if we were to set them to zero.

+

【初译】：

+

【一审】：

+

【二审】：

In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99% of

From Ufldl

Revision as of 20:06, 11 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 61: / Line 61: @@
 [[File:PCA-rawdata.png|600px]]
-【原文】：
+【原文】：This data has already been pre-processed so that each of the features <math>\textstyle x_1</math> and <math>\textstyle x_2</math>
-This data has already been pre-processed so that each of the features <math>\textstyle x_1</math> and <math>\textstyle x_2</math>
 have about the same mean (zero) and variance.
@@ Line 74: / Line 73: @@
 【一审】：数据已经进行了预处理，所以特征值<math>\textstyle x_1</math>和<math>\textstyle x_2</math>有相近的平均值（零）和方差。
 为了更直观的区分，根据<math>\textstyle x_1</math>值的大小我们把它们分成了三种颜色。颜色的区分对于算法没有任何影响，仅仅是为了直观表示。
 【二审】：这些数据已经进行了预处理，所以每个特征值<math>\textstyle x_1</math>和<math>\textstyle x_2</math>有相同的平均值（零）和方差。
 为方便展示，根据<math>\textstyle x_1</math>值的大小，我们将每个点分别涂上了三种颜色的一种。该颜色并不用于算法而仅用于图解。
@@ Line 84: / Line 81: @@
 From visually examining the data, it appears that <math>\textstyle u_1</math> is the principal direction of
 variation of the data, and <math>\textstyle u_2</math> the secondary direction of variation:
+【初译】：
+【一审】：
+【二审】：
 [[File:PCA-u1.png | 600px]]
@@ Line 90: / Line 94: @@
 To more formally find the directions <math>\textstyle u_1</math> and <math>\textstyle u_2</math>, we first compute the matrix <math>\textstyle \Sigma</math>
 as follows:
+【初译】：
+【一审】：
+【二审】：
 :<math>\begin{align}
 \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T.
@@ Line 98: / Line 110: @@
 the top (principal) eigenvector of <math>\textstyle \Sigma</math>, and <math>\textstyle u_2</math> is
 the second eigenvector.
+【初译】：
+【一审】：
+【二审】：
 Note: If you are interested in seeing a more formal mathematical derivation/justification of this result, see the CS229 (Machine Learning) lecture notes on PCA (link at bottom of this page).  You won't need to do so to follow along this course, however.
+【初译】：
+【一审】：
+【二审】：
 You can use standard numerical linear algebra software to find these eigenvectors (see Implementation Notes).
 Concretely, let us compute the eigenvectors of <math>\textstyle \Sigma</math>, and stack
 the eigenvectors in columns to form the matrix <math>\textstyle U</math>:
+【初译】：
+【一审】：
+【二审】：
 :<math>\begin{align}
 U =
@@ Line 115: / Line 150: @@
 <math>\textstyle u_2</math> is the second eigenvector, and so on.
 Also, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the corresponding eigenvalues.
+【初译】：
+【一审】：
+【二审】：
 The vectors <math>\textstyle u_1</math> and <math>\textstyle u_2</math> in our example form a new basis in which we
@@ Line 121: / Line 162: @@
 Similarly, <math>\textstyle u_2^Tx</math> is the magnitude of <math>\textstyle x</math> projected onto the vector <math>\textstyle u_2</math>.
+【初译】：
+【一审】：
+【二审】：
 == Rotating the Data ==
@@ Line 133: / Line 180: @@
 <math>\textstyle x_{\rm rot}^{(i)} = U^Tx^{(i)}</math> for every <math>\textstyle i</math>.  Plotting this transformed data
 <math>\textstyle x_{\rm rot}</math>, we get:
+【初译】：
+【一审】：
+【二审】：
 [[File:PCA-rotated.png|600px]]
@@ Line 148: / Line 202: @@
 \end{align}</math>
 because <math>\textstyle U x_{\rm rot} =  UU^T x = x</math>.
+【初译】：
+【一审】：
+【二审】：
 == Reducing the Data Dimension ==
@@ Line 154: / Line 214: @@
 dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data.  Thus, if we want to
 reduce this data to one dimension, we can set
+【初译】：
+【一审】：
+【二审】：
 :<math>\begin{align}
 \tilde{x}^{(i)} = x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)} \in \Re.
@@ Line 161: / Line 228: @@
 take the first <math>\textstyle k</math> components of <math>\textstyle x_{\rm rot}</math>, which correspond to
 the top <math>\textstyle k</math> directions of variation.
+【初译】：
+【一审】：
+【二审】：
 Another way of explaining PCA is that <math>\textstyle x_{\rm rot}</math> is an <math>\textstyle n</math> dimensional
@@ Line 167: / Line 240: @@
 reasonably large values for most examples <math>\textstyle i</math>), and
 the later components are likely to be small (e.g., in our example,
-<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> was more likely to be small).  What
+<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> was more likely to be small).
+【初译】：
+【一审】：
+【二审】：
+What
 PCA does it it
 drops the the later (smaller) components of <math>\textstyle x_{\rm rot}</math>, and
@@ Line 197: / Line 278: @@
 \end{align}</math>
 In our example, this gives us the following plot of <math>\textstyle \tilde{x}</math> (using <math>\textstyle n=2, k=1</math>):
+【初译】：
+【一审】：
+【二审】：
 [[File:PCA-xtilde.png | 600px]]
@@ Line 203: / Line 290: @@
 always be zero, there is no need to keep these zeros around, and so we
 define <math>\textstyle \tilde{x}</math> as a <math>\textstyle k</math>-dimensional vector with just the first <math>\textstyle k</math> (non-zero) components.
+【初译】：
+【一审】：
+【二审】：
 This also explains why we wanted to express our data in the <math>\textstyle u_1, u_2, \ldots, u_n</math> basis:
 Deciding which components to keep becomes just keeping the top <math>\textstyle k</math> components.  When we
 do this, we also say that we are "retaining the top <math>\textstyle k</math> PCA (or principal) components."
+【初译】：
+【一审】：
+【二审】：
 == Recovering an Approximation of the Data ==
@@ Line 217: / Line 316: @@
 pad it out with <math>\textstyle n-k</math> zeros to get our approximation to <math>\textstyle x_{\rm rot} \in \Re^n</math>.  Finally, we pre-multiply
 by <math>\textstyle U</math> to get our approximation to <math>\textstyle x</math>.  Concretely, we get
+【初译】：
+【一审】：
+【二审】：
 :<math>\begin{align}
 \hat{x}  = U \begin{bmatrix} \tilde{x}_1 \\ \vdots \\ \tilde{x}_k \\ 0 \\ \vdots \\ 0 \end{bmatrix}
@@ Line 226: / Line 332: @@
 multiply <math>\textstyle \tilde{x} \in \Re^k</math> with the first <math>\textstyle k</math> columns of <math>\textstyle U</math> as in the final expression above.)
 Applying this to our dataset, we get the following plot for <math>\textstyle \hat{x}</math>:
+【初译】：
+【一审】：
+【二审】：
 [[File:PCA-xhat.png | 600px]]
@@ Line 238: / Line 350: @@
 to the original, and using PCA this way can significantly speed up your algorithm while
 introducing very little approximation error.
+【初译】：
+【一审】：
+【二审】：
 == Number of components to retain ==
@@ Line 248: / Line 366: @@
 Conversely, if <math>\textstyle k</math> is too small, then we might be using a very bad
 approximation to the data.
+【初译】：
+【一审】：
+【二审】：
 To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance retained'''
@@ Line 255: / Line 380: @@
 Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,
 and thus 0% of the variance is retained.
+【初译】：
+【一审】：
+【二审】：
 More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues
@@ Line 260: / Line 391: @@
 corresponding to the eigenvector <math>\textstyle u_j</math>.  Then if we retain <math>\textstyle k</math> principal components,
 the percentage of variance retained is given by:
+【初译】：
+【一审】：
+【二审】：
 :<math>\begin{align}
 \frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}.
@@ Line 266: / Line 404: @@
 by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,
 or 91.3% of the variance.
+【初译】：
+【一审】：
+【二审】：
 A more formal definition of percentage of variance retained is beyond the scope
@@ Line 276: / Line 420: @@
 <math>\textstyle x_{{\rm rot},j}</math> are the ones that're more variable and that take on larger values,
 and for which we would incur a greater approximation error if we were to set them to zero.
+【初译】：
+【一审】：
+【二审】：
 In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99% of