主成分分析

Revision as of 16:19, 18 March 2013 (view source)

Kandeng (Talk | contribs)

← Older edit

Revision as of 16:30, 18 March 2013 (view source)

Kandeng (Talk | contribs)

Newer edit →

Line 111:

== 还原近似数据 ==

-

现在，我们得到了原始数据 <math>\textstyle x \in \Re^n</math> 的低维“压缩”表征量 <math>\textstyle \tilde{x} \in \Re^k</math> ，反过来，如果给定 <math>\textstyle \tilde{x}</math> ，我们应如何还原原始数据 <math>\textstyle x</math> 呢？查看[[#~~Rotating the Data~~|以往章节]]以往章节可知，要转换回来，只需 <math>\textstyle x = U x_{\rm rot}</math> 即可。进一步，我们把 <math>\textstyle \tilde{x}</math> 看作将 <math>\textstyle x_{\rm rot}</math> 的最后 <math>\textstyle n-k</math> 个元素被置0所得的近似表示，因此如果给定 <math>\textstyle \tilde{x} \in \Re^k</math> ，可以通过在其末尾添加 <math>\textstyle n-k</math> 个0来得到对 <math>\textstyle x_{\rm rot} \in \Re^n</math> 的近似，最后，左乘 <math>\textstyle U</math> 便可近似还原出原数据。具体来说，计算如下：

+

现在，我们得到了原始数据 <math>\textstyle x \in \Re^n</math> 的低维“压缩”表征量 <math>\textstyle \tilde{x} \in \Re^k</math> ，反过来，如果给定 <math>\textstyle \tilde{x}</math> ，我们应如何还原原始数据 <math>\textstyle x</math> 呢？查看[[#旋转数据|以往章节]]以往章节可知，要转换回来，只需 <math>\textstyle x = U x_{\rm rot}</math> 即可。进一步，我们把 <math>\textstyle \tilde{x}</math> 看作将 <math>\textstyle x_{\rm rot}</math> 的最后 <math>\textstyle n-k</math> 个元素被置0所得的近似表示，因此如果给定 <math>\textstyle \tilde{x} \in \Re^k</math> ，可以通过在其末尾添加 <math>\textstyle n-k</math> 个0来得到对 <math>\textstyle x_{\rm rot} \in \Re^n</math> 的近似，最后，左乘 <math>\textstyle U</math> 便可近似还原出原数据。具体来说，计算如下：

:<math>\begin{align}

Line 118:

\end{align}</math>

-

+

上面的等式基于[[#实例和数学背景|先前]]对 <math>\textstyle U</math> 的定义。在实现时，我们实际上并不先给 <math>\textstyle \tilde{x}</math> 填0然后再左乘 <math>\textstyle U</math> ，因为这意味着大量的乘0运算。我们可用 <math>\textstyle \tilde{x} \in \Re^k</math> 来与 <math>\textstyle U</math> 的前 <math>\textstyle k</math> 列相乘，即上式中最右项，来达到同样的目的。将该算法应用于本例中的数据集，可得如下关于重构数据 <math>\textstyle \hat{x}</math> 的点图：

-

~~【原文】：The final equality above comes from the definition of <math>\textstyle U</math>~~ [[#~~Example and Mathematical Background~~|~~given earlier~~]].

+

-

~~(In a practical implementation, we wouldn't actually zero pad <math>\textstyle \tilde{x}</math> and then multiply~~

+

-

by <math>\textstyle U</math>~~, since that would mean multiplying a lot of things by zeros; instead, we'd just~~

+

-

~~multiply <math>\textstyle \tilde{x} \in \Re^k</math> with the first <math>\textstyle k</math> columns of <math>\textstyle U</math> as in the final expression above.)~~

+

-

~~Applying this to our dataset, we get the following plot for <math>\textstyle \hat{x}</math>:~~

+

-

+

-

~~【初译】：该式中的第二个等号由先前对<math>\textstyle U</math>的定义可知成立，（在实际应用时，我们不倾向于先给~~<math>\textstyle \tilde{x}</math>填0然后再左乘<math>\textstyle U</math>~~，因为这样意味着大量的乘0运算，相反我们选择用<math>\textstyle U</math>的前<math>\textstyle k</math>列来乘~~<math>\textstyle \tilde{x} \in \Re^k</math>~~，其结果也即等于上面式子中最右边项。）将该算法应用到本章节的样例数据集，我们可以得到以下关于<math>\textstyle \hat{x}</math>的作图：~~

+

-

+

-

【一审】：上面的等式来源于先前对<math>\textstyle U</math>的定义，（在实际应用时，我们不倾向于先给<math>\textstyle \tilde{x}</math>填0然后再左乘<math>\textstyle U</math>，因为这样意味着大量的乘0运算，相反我们选择用<math>\textstyle U</math>的前<math>\textstyle k</math>列来乘<math>\textstyle \tilde{x} \in \Re^k</math>，其结果也即等于上面式子中最右边项。）将该算法应用到本章节的样例数据集，我们可以得到以下关于<math>\textstyle \hat{x}</math>的图示：

+

-

+

-

【二审】：上面的等式来源于先前对<math>\textstyle U</math>的定义，（在实际应用时，我们不倾向于先给<math>\textstyle \tilde{x}</math>填0然后再左乘<math>\textstyle U</math>，因为这意味着大量的乘0运算，相反我们选择用<math>\textstyle \tilde{x} \in \Re^k</math>的前<math>\textstyle k</math>列来乘<math>\textstyle U</math>，即上式中最右项。）将该算法应用于本例中的数据集，我们可得如下关于 <math>\textstyle \hat{x}</math>~~的图示：~~

+

[[File:PCA-xhat.png | 600px]]

+

由图可见，我们得到的是对原始数据集的一维近似重构。

-

~~【原文】：We are thus using a 1 dimensional approximation to the original dataset.~~

+

在训练自动编码器或其它无监督特征学习算法时，算法运行时间将依赖于输入数据的维数。若用 <math>\textstyle \tilde{x} \in \Re^k</math> 取代 <math>\textstyle x</math> 作为输入数据，那么算法就可使用低维数据进行训练，运行速度将显著加快。对于很多数据集来说，低维表征量 <math>\textstyle \tilde{x}</math> 是原数据集的极佳近似，因此在这些场合使用PCA是很合适的，它引入的近似误差的很小，却可显著地提高你算法的运行速度。

-

+

-

~~If you are training an autoencoder or other unsupervised feature learning algorithm,~~

+

-

~~the running time of your algorithm will depend on the dimension of the input. If you feed <math>\textstyle \tilde{x} \in \Re^k</math>~~

+

-

~~into your learning algorithm instead of <math>\textstyle x</math>, then you'll be training on a lower-dimensional~~

+

-

~~input, and thus your algorithm might run significantly faster. For many datasets,~~

+

-

~~the lower dimensional <math>\textstyle \tilde{x}</math> representation can be an extremely good approximation~~

+

-

~~to the original, and using PCA this way can significantly speed up your algorithm while~~

+

-

~~introducing very little approximation error.~~

+

-

+

-

~~【初译】：由图可看出我们实际上得到的是对原始数据的一维近似。~~

+

-

+

-

~~如果要训练一个自动编码器（autoencoder）或其它无监督特征学习算法，运算时间将直接依赖于输入数据的维数。若用~~<math>\textstyle \tilde{x} \in \Re^k</math>取代<math>\textstyle x</math>~~作为输入数据，那么算法将使用该低维数据进行训练，运行速度也大大加快。对于很多数据集来说，低维表征量~~<math>\textstyle \tilde{x}</math>~~都可达到对原数据集的完美近似，因此对这些数据集使用PCA算法将可保证在只产生较小近似误差的同时极大地提速程序。~~

+

-

+

-

~~【一审】：由上图可看出我们实际上得到的是对原始数据的一维近似。~~

+

-

+

-

如果要训练一个自动编码器（autoencoder）或其它无监督特征学习算法，运算时间将直接依赖于输入数据的维度数。若用<math>\textstyle \tilde{x} \in \Re^k</math>取代<math>\textstyle x</math>作为输入数据，那么算法将使用该低维数据进行训练，运行速度也大大加快。对于很多数据集来说，低维表征量<math>\textstyle \tilde{x}</math>都可达到对原数据集的完美近似，因此对这些数据集使用PCA算法将可保证在只产生较小近似误差的同时极大地提速程序。

+

-

+

-

~~【二审】：由图可知我们得到的是对原始数据集的一维近似。~~

+

-

+

-

如果要训练一个自动编码器或其它无监督特征学习算法，算法运行时间将依赖于输入数据的维数。若用<math>\textstyle \tilde{x} \in \Re^k</math>取代<math>\textstyle x</math>作为输入数据，那么算法将使用低维数据进行训练，运行速度将显著加快。对于很多数据集来说，低维表征量<math>\textstyle \tilde{x}</math>即为原数据集的极佳近似，如此使用PCA算法可在只产生极小近似误差的同时，显著地提高运行速度。

+

-

+

-

+

-

+

-

~~== Number of components to retain 选择主成分个数 ==~~

+

-

+

-

+

-

~~【原文】：How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain? In our~~

+

-

~~simple 2 dimensional example, it seemed natural to retain 1 out of the 2~~

+

-

~~components, but for higher dimensional data, this decision is less trivial. If <math>\textstyle k</math> is~~

+

-

~~too large, then we won't be compressing the data much; in the limit of <math>\textstyle k=n</math>,~~

+

-

~~then we're just using the original data (but rotated into a different basis).~~

+

-

~~Conversely, if <math>\textstyle k</math> is too small, then we might be using a very bad~~

+

-

~~approximation to the data.~~

+

-

+

-

【初译】：接下来的问题是我们如何选择<math>\textstyle k</math>，即有多少个PCA主成分值得保留？在我们这个简单的二维实验中，保留第一个成分是十分自然的选择，但对于高维数据来说，做这个决定就没那么简单：如果<math>\textstyle k</math>过大，我们便没有多少有效压缩，如果是极限情况<math>\textstyle k=n</math>，我们等同于在使用原始数据（只是旋转投射到了一组不同的基）；相反地，如果<math>\textstyle k</math>过小，那我们使用的近似值也可能带来很大的近似误差。

+

-

【一审】：接下来的问题是我们如何选择<math>\textstyle k</math>，即有多少个PCA主成分值得保留？在我们这个简单的二维实验中，保留一个成分是十分自然的选择，但对于高维数据来说，做这个决定就没那么简单：如果<math>\textstyle k</math>~~过大，我们便没有多少有效压缩，如果是极限情况~~<math>\textstyle k=n</math>~~，我们等同于在使用原始数据（只是旋转投射到了一组不同的基）；相反地，如果~~<math>\textstyle k</math>~~过小，那我们很可能比较差的近似数据。~~

+

== 选择主成分个数 ==

+

我们该如何选择，即保留多少个PCA主成分？在上面这个简单的二维实验中，保留第一个成分看起来是自然的选择。对于高维数据来说，做这个决定就没那么简单：如果 <math>\textstyle k</math> 过大，数据压缩率不高，在极限情况 <math>\textstyle k=n</math> 时，等于是在使用原始数据（只是旋转投射到了不同的基）；相反地，如果 <math>\textstyle k</math> 过小，那数据的近似误差太太。

-

【二审】：我们如何选择<math>\textstyle k</math>，即有多少个PCA主成分应该保留？在我们这个简单的二维实验中，保留第一个成分看起来是自然的选择，但对于高维数据来说，做这个决定就没那么简单：如果<math>\textstyle k</math>过大，数据压缩率不高，在极限情况<math>\textstyle k=n</math>时，等于是在使用原始数据（只是旋转投射到了不同的基）；相反地，如果<math>\textstyle k</math>过小，那我们可能使用很差的近似数据。

From Ufldl

Revision as of 16:30, 18 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 111: / Line 111: @@
 == 还原近似数据 ==
-现在，我们得到了原始数据 <math>\textstyle x \in \Re^n</math> 的低维“压缩”表征量 <math>\textstyle \tilde{x} \in \Re^k</math> ， 反过来，如果给定 <math>\textstyle \tilde{x}</math> ，我们应如何还原原始数据 <math>\textstyle x</math> 呢？查看[[#Rotating the Data|以往章节]]以往章节可知，要转换回来，只需 <math>\textstyle x = U x_{\rm rot}</math> 即可。进一步，我们把 <math>\textstyle \tilde{x}</math> 看作将 <math>\textstyle x_{\rm rot}</math> 的最后 <math>\textstyle n-k</math> 个元素被置0所得的近似表示，因此如果给定 <math>\textstyle \tilde{x} \in \Re^k</math> ，可以通过在其末尾添加 <math>\textstyle n-k</math> 个0来得到对 <math>\textstyle x_{\rm rot} \in \Re^n</math> 的近似，最后，左乘 <math>\textstyle U</math> 便可近似还原出原数据 。具体来说，计算如下：
+现在，我们得到了原始数据 <math>\textstyle x \in \Re^n</math> 的低维“压缩”表征量 <math>\textstyle \tilde{x} \in \Re^k</math> ， 反过来，如果给定 <math>\textstyle \tilde{x}</math> ，我们应如何还原原始数据 <math>\textstyle x</math> 呢？查看[[#旋转数据|以往章节]]以往章节可知，要转换回来，只需 <math>\textstyle x = U x_{\rm rot}</math> 即可。进一步，我们把 <math>\textstyle \tilde{x}</math> 看作将 <math>\textstyle x_{\rm rot}</math> 的最后 <math>\textstyle n-k</math> 个元素被置0所得的近似表示，因此如果给定 <math>\textstyle \tilde{x} \in \Re^k</math> ，可以通过在其末尾添加 <math>\textstyle n-k</math> 个0来得到对 <math>\textstyle x_{\rm rot} \in \Re^n</math> 的近似，最后，左乘 <math>\textstyle U</math> 便可近似还原出原数据 。具体来说，计算如下：
 :<math>\begin{align}
@@ Line 118: / Line 118: @@
 \end{align}</math>
+上面的等式基于[[#实例和数学背景|先前]]对 <math>\textstyle U</math> 的定义。在实现时，我们实际上并不先给 <math>\textstyle \tilde{x}</math> 填0然后再左乘 <math>\textstyle U</math> ，因为这意味着大量的乘0运算。我们可用 <math>\textstyle \tilde{x} \in \Re^k</math> 来与 <math>\textstyle U</math> 的前 <math>\textstyle k</math> 列相乘，即上式中最右项，来达到同样的目的。将该算法应用于本例中的数据集，可得如下关于重构数据 <math>\textstyle \hat{x}</math> 的点图：
-【原文】：The final equality above comes from the definition of <math>\textstyle U</math> [[#Example and Mathematical Background|given earlier]].
-(In a practical implementation, we wouldn't actually zero pad <math>\textstyle \tilde{x}</math> and then multiply
-by <math>\textstyle U</math>, since that would mean multiplying a lot of things by zeros; instead, we'd just
-multiply <math>\textstyle \tilde{x} \in \Re^k</math> with the first <math>\textstyle k</math> columns of <math>\textstyle U</math> as in the final expression above.)
-Applying this to our dataset, we get the following plot for <math>\textstyle \hat{x}</math>:
-【初译】：该式中的第二个等号由先前对<math>\textstyle U</math>的定义可知成立，（在实际应用时，我们不倾向于先给<math>\textstyle \tilde{x}</math>填0然后再左乘<math>\textstyle U</math>，因为这样意味着大量的乘0运算，相反我们选择用<math>\textstyle U</math>的前<math>\textstyle k</math>列来乘<math>\textstyle \tilde{x} \in \Re^k</math>，其结果也即等于上面式子中最右边项。）将该算法应用到本章节的样例数据集，我们可以得到以下关于<math>\textstyle \hat{x}</math>的作图：
-【一审】：上面的等式来源于先前对<math>\textstyle U</math>的定义，（在实际应用时，我们不倾向于先给<math>\textstyle \tilde{x}</math>填0然后再左乘<math>\textstyle U</math>，因为这样意味着大量的乘0运算，相反我们选择用<math>\textstyle U</math>的前<math>\textstyle k</math>列来乘<math>\textstyle \tilde{x} \in \Re^k</math>，其结果也即等于上面式子中最右边项。）将该算法应用到本章节的样例数据集，我们可以得到以下关于<math>\textstyle \hat{x}</math>的图示：
-【二审】：上面的等式来源于先前对<math>\textstyle U</math>的定义，（在实际应用时，我们不倾向于先给<math>\textstyle \tilde{x}</math>填0然后再左乘<math>\textstyle U</math>，因为这意味着大量的乘0运算，相反我们选择用<math>\textstyle \tilde{x} \in \Re^k</math>的前<math>\textstyle k</math>列来乘<math>\textstyle U</math>，即上式中最右项。）将该算法应用于本例中的数据集，我们可得如下关于 <math>\textstyle \hat{x}</math>的图示：
 [[File:PCA-xhat.png | 600px]]
+由图可见，我们得到的是对原始数据集的一维近似重构。
-【原文】：We are thus using a 1 dimensional approximation to the original dataset.
+在训练自动编码器或其它无监督特征学习算法时，算法运行时间将依赖于输入数据的维数。若用 <math>\textstyle \tilde{x} \in \Re^k</math> 取代 <math>\textstyle x</math> 作为输入数据，那么算法就可使用低维数据进行训练，运行速度将显著加快。对于很多数据集来说，低维表征量 <math>\textstyle \tilde{x}</math> 是原数据集的极佳近似，因此在这些场合使用PCA是很合适的，它引入的近似误差的很小，却可显著地提高你算法的运行速度。
-If you are training an autoencoder or other unsupervised feature learning algorithm,
-the running time of your algorithm will depend on the dimension of the input.  If you feed <math>\textstyle \tilde{x} \in \Re^k</math>
-into your learning algorithm instead of <math>\textstyle x</math>, then you'll be training on a lower-dimensional
-input, and thus your algorithm might run significantly faster.  For many datasets,
-the lower dimensional <math>\textstyle \tilde{x}</math> representation can be an extremely good approximation
-to the original, and using PCA this way can significantly speed up your algorithm while
-introducing very little approximation error.
-【初译】：由图可看出我们实际上得到的是对原始数据的一维近似。
-如果要训练一个自动编码器（autoencoder）或其它无监督特征学习算法，运算时间将直接依赖于输入数据的维数。若用<math>\textstyle \tilde{x} \in \Re^k</math>取代<math>\textstyle x</math>作为输入数据，那么算法将使用该低维数据进行训练，运行速度也大大加快。对于很多数据集来说，低维表征量<math>\textstyle \tilde{x}</math>都可达到对原数据集的完美近似，因此对这些数据集使用PCA算法将可保证在只产生较小近似误差的同时极大地提速程序。
-【一审】：由上图可看出我们实际上得到的是对原始数据的一维近似。
-如果要训练一个自动编码器（autoencoder）或其它无监督特征学习算法，运算时间将直接依赖于输入数据的维度数。若用<math>\textstyle \tilde{x} \in \Re^k</math>取代<math>\textstyle x</math>作为输入数据，那么算法将使用该低维数据进行训练，运行速度也大大加快。对于很多数据集来说，低维表征量<math>\textstyle \tilde{x}</math>都可达到对原数据集的完美近似，因此对这些数据集使用PCA算法将可保证在只产生较小近似误差的同时极大地提速程序。
-【二审】：由图可知我们得到的是对原始数据集的一维近似。
-如果要训练一个自动编码器或其它无监督特征学习算法，算法运行时间将依赖于输入数据的维数。若用<math>\textstyle \tilde{x} \in \Re^k</math>取代<math>\textstyle x</math>作为输入数据，那么算法将使用低维数据进行训练，运行速度将显著加快。对于很多数据集来说，低维表征量<math>\textstyle \tilde{x}</math>即为原数据集的极佳近似，如此使用PCA算法可在只产生极小近似误差的同时，显著地提高运行速度。
-== Number of components to retain 选择主成分个数 ==
-【原文】：How do we set <math>\textstyle k</math>; i.e., how many PCA components should we retain?  In our
-simple 2 dimensional example, it seemed natural to retain 1 out of the 2
-components, but for higher dimensional data, this decision is less trivial.  If <math>\textstyle k</math> is
-too large, then we won't be compressing the data much; in the limit of <math>\textstyle k=n</math>,
-then we're just using the original data (but rotated into a different basis).
-Conversely, if <math>\textstyle k</math> is too small, then we might be using a very bad
-approximation to the data.
-【初译】：接下来的问题是我们如何选择<math>\textstyle k</math>，即有多少个PCA主成分值得保留？在我们这个简单的二维实验中，保留第一个成分是十分自然的选择，但对于高维数据来说，做这个决定就没那么简单：如果<math>\textstyle k</math>过大，我们便没有多少有效压缩，如果是极限情况<math>\textstyle k=n</math>，我们等同于在使用原始数据（只是旋转投射到了一组不同的基）；相反地，如果<math>\textstyle k</math>过小，那我们使用的近似值也可能带来很大的近似误差。
-【一审】：接下来的问题是我们如何选择<math>\textstyle k</math>，即有多少个PCA主成分值得保留？在我们这个简单的二维实验中，保留一个成分是十分自然的选择，但对于高维数据来说，做这个决定就没那么简单：如果<math>\textstyle k</math>过大，我们便没有多少有效压缩，如果是极限情况<math>\textstyle k=n</math>，我们等同于在使用原始数据（只是旋转投射到了一组不同的基）；相反地，如果<math>\textstyle k</math>过小，那我们很可能比较差的近似数据。
+== 选择主成分个数 ==
+我们该如何选择 ，即保留多少个PCA主成分？在上面这个简单的二维实验中，保留第一个成分看起来是自然的选择。对于高维数据来说，做这个决定就没那么简单：如果 <math>\textstyle k</math> 过大，数据压缩率不高，在极限情况 <math>\textstyle k=n</math> 时，等于是在使用原始数据（只是旋转投射到了不同的基）；相反地，如果 <math>\textstyle k</math> 过小，那数据的近似误差太太。
-【二审】：我们如何选择<math>\textstyle k</math>，即有多少个PCA主成分应该保留？在我们这个简单的二维实验中，保留第一个成分看起来是自然的选择，但对于高维数据来说，做这个决定就没那么简单：如果<math>\textstyle k</math>过大，数据压缩率不高，在极限情况<math>\textstyle k=n</math>时，等于是在使用原始数据（只是旋转投射到了不同的基）；相反地，如果<math>\textstyle k</math>过小，那我们可能使用很差的近似数据。