主成分分析

Revision as of 20:06, 11 March 2013 (view source)

Kandeng (Talk | contribs)

m

← Older edit

Revision as of 20:22, 11 March 2013 (view source)

Kandeng (Talk | contribs)

Newer edit →

Line 392:

the percentage of variance retained is given by:

-

~~【初译】：~~

+

【初译】：更一般的情况，假设<math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math>表示<math>\textstyle \Sigma</math>的特征值（按由大到小顺序排列），则<math>\textstyle \lambda_j</math>为对应于特征向量<math>\textstyle u_j</math>的特征值，如果我们保留前<math>\textstyle k</math>个成分，则保留的方差百分比可计算为:

【一审】：

Line 401:

\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}.

\end{align}</math>

+

In our simple 2D example above, <math>\textstyle \lambda_1 = 7.29</math>, and <math>\textstyle \lambda_2 = 0.69</math>. Thus,

by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,

or 91.3% of the variance.

-

~~【初译】：~~

+

【初译】：在我们的二维实验中，<math>\textstyle \lambda_1 = 7.29</math>，<math>\textstyle \lambda_2 = 0.69</math>，保留前<math>\textstyle k=1</math>个主成分，也等于我们保留了<math>\textstyle 7.29/(7.29+0.69) = 0.913</math>，即91.3%的方差。

【一审】：

Line 421:

Line 422:

and for which we would incur a greater approximation error if we were to set them to zero.

-

~~【初译】：~~

+

【初译】：虽然对保留方差的百分比进行更规范的定义已超出了本教程的范围，但退一步说，因可以证明<math>\textstyle \lambda_j =

+

\sum_{i=1}^m x_{{\rm rot},j}^2</math>，如果 <math>\textstyle \lambda_j \approx 0</math>，则说明<math>\textstyle x_{{\rm rot},j}</math>基本接近于0，将其赋值为常数0因而并不会产生较大损失，这也解释了为什么要保留排名靠前的主成分（对应值较大的<math>\textstyle \lambda_j</math>）而不是末尾的那些，这些排名靠前的主成分<math>\textstyle x_{{\rm rot},j}</math>方差更大，值也较大，如果设为0将引入较大的近似误差。

【一审】：

Line 437:

Line 439:

the variance will also be a much more easily interpretable description than saying

that you retained 120 (or whatever other number of) components.

+

【初译】：处理图像数据时，一个惯常的经验法则是选择<math>\textstyle k</math>以保留99%的方差，换一句话说，我们选择所有满足以下条件的<math>\textstyle k</math>中的最小值：

+

:<math>\begin{align}

+

\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j} \geq 0.99.

+

\end{align}</math>

+

对于其它的应用，如果不介意引入的误差稍大，那保留方差百分比在90-98%范围内可能都合理。若向他人介绍PCA算法，告诉他们你选择的<math>\textstyle k</math>是为了保留95%的方差相较于直接告诉他们你保留了前120个（或任意某个数字）主成分也更便于他人理解。

+

【一审】：

+

【二审】：

== PCA on Images ==

Line 448:

Line 460:

the value of pixel <math>\textstyle j</math>. By "natural images," we informally mean the type of image that

a typical animal or person might see over their lifetime.

+

【初译】：为使PCA算法能正常工作，通常我们希望所有的特征<math>\textstyle x_1, x_2, \ldots, x_n</math>都有相似的取值范围（并且均值接近于0）。如果你曾对其它数据应用过PCA算法，你可能知道有必要单独对每个特征<math>\textstyle x_j</math>做预处理，估算其均值和方差以将其规范为零均值和单位方差。对于大部分的图像文件，我们却不进行这样的预处理，尤其是在我们使用“自然图像”（natural images）来训练算法时——“自然图像”可不正规的定义为通常动物或者人类可肉眼看到的图像类型——此时特征<math>\textstyle x_j</math>代表的是像素<math>\textstyle j</math>的值。

+

【一审】：

+

【二审】：

Note: Usually we use images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image patches randomly from these to train the algorithm. But in practice most feature learning algorithms are extremely robust to the exact type of image it is trained on, so most images taken with a normal camera, so long as they aren't excessively blurry or have strange artifacts, should work.

+

【初译】：注：通常我们使用户外拍摄草木等场景的照片，并从中随机截取小图像块（大小为16乘16像素）来训练算法，实际应用中我们发现，大多数特征学习算法对于训练的图片类型都具有极大的鲁棒性（robust），普通照相机拍摄的图片，只要不是特别的模糊或者有非常奇怪的人工痕迹，都应可以使用。

+

【一审】：

+

【二审】：

When training on natural images, it makes little sense to estimate a separate mean and

Line 455:

Line 479:

of the image should (theoretically) be the same as any other.

This property of images is called '''stationarity.'''

+

【初译】：在自然图片上训练时，对每一个像素估计一个单独的均值和方差意义不大，因为理论上，图像任一部分的统计数据都应该和任意其它部分相同，图像的这种特性被称作平稳性（stationarity）。

+

【一审】：

+

【二审】：

In detail, in order for PCA to work well, informally we require that (i) The

Line 469:

Line 499:

positive number (thus scaling every feature in every training example by the

same number), PCA's output eigenvectors will not change.

+

【初译】：具体操作中，为使PCA算法正常工作，我们通常需要满足以下要求：(1) 特征的均值大致为0； (2) 不同的特征的方差相似。对于自然图片，即使不进行方差归一化操作，条件(2)也永远满足，故而我们不进行任何方差归一化操作（当处理音频文件时，比如声谱，或者文本文件，比如bag-of-word向量，我们通常也不做方差归一化处理）。实际上，PCA算法对于输入数据具有放大（scaling）不变性，无论输入数据的值被如何放大（或缩小），返回的特征值都不改变。下面给出这一说法更规范的描述：如果将每个特征量都乘以同一个正数（即所有特征量被放大或缩小了相同的倍数），PCA的输出特征向量将不会发生变化。

+

【一审】：

+

【二审】：

So, we won't use variance normalization. The only normalization we need to

Line 478:

Line 514:

mean intensity value of an image patch; thus, we can subtract out this value,

as a form of mean normalization.

+

【初译】：既然我们不做方差归一化，唯一还需进行的规范化操作就是均值规范化，其目的是保证所有特征的均值都在0附近。根据不同的应用，很多情况下我们并不关注所输入的图像文件的明亮程度，比如对于物体识别，图像的整体明亮程度并不影响图像中存在的是什么物体。更为规范地说，我们对某一个图像的平均强度值不感兴趣，因此我们可以减去这个值来进行均值规范化（即零均值化）。

+

【一审】：

+

【二审】：

Concretely, if <math>\textstyle x^{(i)} \in \Re^{n}</math> are the (grayscale) intensity values of

a 16x16 image patch (<math>\textstyle n=256</math>), we might normalize the intensity of each image

<math>\textstyle x^{(i)}</math> as follows:

+

【初译】：具体的步骤是，如果<math>\textstyle x^{(i)} \in \Re^{n}</math>代表16乘16的图像块的灰度值（<math>\textstyle n=256</math>），我们可以应用如下算法来对每一个图像块<math>\textstyle x^{(i)}</math>的（灰度）强度值进行零均值化操作：

+

【一审】：

+

【二审】：

Line 490:

Line 538:

and that <math>\textstyle \mu^{(i)}</math> here is the mean intensity of the image <math>\textstyle x^{(i)}</math>. In particular,

this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.

+

【初译】：请注意上面两个步骤需要单独对每个输入图像块<math>\textstyle x^{(i)}</math>应用，且<math>\textstyle \mu^{(i)}</math>计算的是每个图像块<math>\textstyle x^{(i)}</math>的平均强度值，尤其需要注意的是，这和试图为每一个像素<math>\textstyle x_j</math>单独估算均值是完全不同的概念。

+

【一审】：

+

【二审】：

If you are training your algorithm on images other than natural images (for example, images of handwritten characters, or images of single isolated objects centered against a white background), other types of normalization might be worth considering, and the best choice may be application dependent. But when training on natural images, using the per-image mean normalization method as given in the equations above would be a reasonable default.

+

【初译】：如果你处理的图像并非自然图像（比如，手写文字，或者白背景正中摆放单独物体），其他规范化操作可能就值得引入，哪种做法最合适也将依赖于具体应用场合，但是对自然图像进行训练时，使用如上所述的整个图像块范围内的均值规范化操作可以放心假定为合理的办法。

+

【一审】：

+

【二审】：

== References ==

From Ufldl

Revision as of 20:22, 11 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 392: / Line 392: @@
 the percentage of variance retained is given by:
-【初译】：
+【初译】：更一般的情况，假设<math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math>表示<math>\textstyle \Sigma</math>的特征值（按由大到小顺序排列），则<math>\textstyle \lambda_j</math>为对应于特征向量<math>\textstyle u_j</math>的特征值，如果我们保留前<math>\textstyle k</math>个成分，则保留的方差百分比可计算为:
 【一审】：
@@ Line 401: / Line 401: @@
 \frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j}.
 \end{align}</math>
 In our simple 2D example above, <math>\textstyle \lambda_1 = 7.29</math>, and <math>\textstyle \lambda_2 = 0.69</math>.  Thus,
 by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,
 or 91.3% of the variance.
-【初译】：
+【初译】：在我们的二维实验中，<math>\textstyle \lambda_1 = 7.29</math>，<math>\textstyle \lambda_2 = 0.69</math>，保留前<math>\textstyle k=1</math>个主成分，也等于我们保留了<math>\textstyle 7.29/(7.29+0.69) = 0.913</math>，即91.3%的方差。
 【一审】：
@@ Line 421: / Line 422: @@
 and for which we would incur a greater approximation error if we were to set them to zero.
-【初译】：
+【初译】：虽然对保留方差的百分比进行更规范的定义已超出了本教程的范围，但退一步说，因可以证明<math>\textstyle \lambda_j =
+\sum_{i=1}^m x_{{\rm rot},j}^2</math>，如果 <math>\textstyle \lambda_j \approx 0</math>，则说明<math>\textstyle x_{{\rm rot},j}</math>基本接近于0，将其赋值为常数0因而并不会产生较大损失，这也解释了为什么要保留排名靠前的主成分（对应值较大的<math>\textstyle \lambda_j</math>）而不是末尾的那些， 这些排名靠前的主成分<math>\textstyle x_{{\rm rot},j}</math>方差更大，值也较大，如果设为0将引入较大的近似误差。
 【一审】：
@@ Line 437: / Line 439: @@
 the variance will also be a much more easily interpretable description than saying
 that you retained 120 (or whatever other number of) components.
+【初译】：处理图像数据时，一个惯常的经验法则是选择<math>\textstyle k</math>以保留99%的方差，换一句话说，我们选择所有满足以下条件的<math>\textstyle k</math>中的最小值：
+:<math>\begin{align}
+\frac{\sum_{j=1}^k \lambda_j}{\sum_{j=1}^n \lambda_j} \geq 0.99.
+\end{align}</math>
+对于其它的应用，如果不介意引入的误差稍大，那保留方差百分比在90-98%范围内可能都合理。若向他人介绍PCA算法，告诉他们你选择的<math>\textstyle k</math>是为了保留95%的方差相较于直接告诉他们你保留了前120个（或任意某个数字）主成分也更便于他人理解。
+【一审】：
+【二审】：
 == PCA on Images ==
@@ Line 448: / Line 460: @@
 the value of pixel <math>\textstyle j</math>.  By "natural images," we informally mean the type of image that
 a typical animal or person might see over their lifetime.
+【初译】：为使PCA算法能正常工作，通常我们希望所有的特征<math>\textstyle x_1, x_2, \ldots, x_n</math>都有相似的取值范围（并且均值接近于0）。如果你曾对其它数据应用过PCA算法，你可能知道有必要单独对每个特征<math>\textstyle x_j</math>做预处理，估算其均值和方差以将其规范为零均值和单位方差。对于大部分的图像文件，我们却不进行这样的预处理，尤其是在我们使用“自然图像”（natural images）来训练算法时——“自然图像”可不正规的定义为通常动物或者人类可肉眼看到的图像类型——此时特征<math>\textstyle x_j</math>代表的是像素<math>\textstyle j</math>的值。
+【一审】：
+【二审】：
 Note: Usually we use images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image patches randomly from these to train the algorithm.  But in practice most feature learning algorithms are extremely robust to the exact type of image  it is trained on, so most images taken with a normal camera, so long as they aren't excessively blurry or have strange artifacts, should work.
+【初译】：注：通常我们使用户外拍摄草木等场景的照片，并从中随机截取小图像块（大小为16乘16像素）来训练算法，实际应用中我们发现，大多数特征学习算法对于训练的图片类型都具有极大的鲁棒性（robust），普通照相机拍摄的图片，只要不是特别的模糊或者有非常奇怪的人工痕迹，都应可以使用。
+【一审】：
+【二审】：
 When training on natural images, it makes little sense to estimate a separate mean and
@@ Line 455: / Line 479: @@
 of the image should (theoretically) be the same as any other.
 This property of images is called '''stationarity.'''
+【初译】：在自然图片上训练时，对每一个像素估计一个单独的均值和方差意义不大，因为理论上，图像任一部分的统计数据都应该和任意其它部分相同，图像的这种特性被称作平稳性（stationarity）。
+【一审】：
+【二审】：
 In detail, in order for PCA to work well, informally we require that (i) The
@@ Line 469: / Line 499: @@
 positive number (thus scaling every feature in every training example by the
 same number), PCA's output eigenvectors will not change.
+【初译】：具体操作中，为使PCA算法正常工作，我们通常需要满足以下要求：(1) 特征的均值大致为0； (2) 不同的特征的方差相似。对于自然图片，即使不进行方差归一化操作，条件(2)也永远满足，故而我们不进行任何方差归一化操作 （当处理音频文件时，比如声谱，或者文本文件，比如bag-of-word向量，我们通常也不做方差归一化处理）。实际上，PCA算法对于输入数据具有放大（scaling）不变性，无论输入数据的值被如何放大（或缩小），返回的特征值都不改变。下面给出这一说法更规范的描述：如果将每个特征量都乘以同一个正数（即所有特征量被放大或缩小了相同的倍数），PCA的输出特征向量将不会发生变化。
+【一审】：
+【二审】：
 So, we won't use variance normalization.  The only normalization we need to
@@ Line 478: / Line 514: @@
 mean intensity value of an image patch; thus, we can subtract out this value,
 as a form of mean normalization.
+【初译】：既然我们不做方差归一化，唯一还需进行的规范化操作就是均值规范化，其目的是保证所有特征的均值都在0附近。根据不同的应用，很多情况下我们并不关注所输入的图像文件的明亮程度，比如对于物体识别，图像的整体明亮程度并不影响图像中存在的是什么物体。更为规范地说，我们对某一个图像的平均强度值不感兴趣，因此我们可以减去这个值来进行均值规范化（即零均值化）。
+【一审】：
+【二审】：
 Concretely, if <math>\textstyle x^{(i)} \in \Re^{n}</math> are the (grayscale) intensity values of
 a 16x16 image patch (<math>\textstyle n=256</math>), we might normalize the intensity of each image
 <math>\textstyle x^{(i)}</math> as follows:
+【初译】：具体的步骤是，如果<math>\textstyle x^{(i)} \in \Re^{n}</math>代表16乘16的图像块的灰度值（<math>\textstyle n=256</math>），我们可以应用如下算法来对每一个图像块<math>\textstyle x^{(i)}</math>的（灰度）强度值进行零均值化操作：
+【一审】：
+【二审】：
 <math>\mu^{(i)} := \frac{1}{n} \sum_{j=1}^n x^{(i)}_j</math>
@@ Line 490: / Line 538: @@
 and that <math>\textstyle \mu^{(i)}</math> here is the mean intensity of the image <math>\textstyle x^{(i)}</math>.  In particular,
 this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.
+【初译】：请注意上面两个步骤需要单独对每个输入图像块<math>\textstyle x^{(i)}</math>应用，且<math>\textstyle \mu^{(i)}</math>计算的是每个图像块<math>\textstyle x^{(i)}</math>的平均强度值，尤其需要注意的是，这和试图为每一个像素<math>\textstyle x_j</math>单独估算均值是完全不同的概念。
+【一审】：
+【二审】：
 If you are training your algorithm on images other than natural images (for example, images of handwritten characters, or images of single isolated objects centered against a white background), other types of normalization might be worth considering, and the best choice may be application dependent. But when training on natural images, using the per-image mean normalization method as given in the equations above would be a reasonable default.
+【初译】：如果你处理的图像并非自然图像（比如，手写文字，或者白背景正中摆放单独物体），其他规范化操作可能就值得引入，哪种做法最合适也将依赖于具体应用场合，但是对自然图像进行训练时，使用如上所述的整个图像块范围内的均值规范化操作可以放心假定为合理的办法。
+【一审】：
+【二审】：
 == References ==