白化

:【原文】：
:Introduction
 
We have used PCA to reduce the dimension of the data. There is a closely related preprocessing step called whitening (or, in some other literatures, sphering) which is needed for some algorithms. If we are training on images, the raw input is redundant, since adjacent pixel values are highly correlated. The goal of whitening is to make the input less redundant; more formally, our desiderata are that our learning algorithms sees a training input where (i) the features are less correlated with each other, and (ii) the features all have the same variance.

:【初译】：
:介绍

我们使用过 PCA 来降低数据维度. 还有一个与之非常相关的预处理步骤叫 白化 (一些文献中也称sphering) ，一些算法中需要这个步骤. 当在图像数据上进行训练时, 由于相邻像素值高度相关，原始的输入具有冗余性。白化的目标是降低输入的冗余性；更形式化地说，我们需要的是使我们的学习算法可以有一个供训练的输入使得：(i) 特征之间相关性更低 (ii) 特征都有相同的方差。

:【一校】：
我们已经了解了如何使用PCA降低数据维度。在一些算法中还需要一个与之相关的预处理步骤，这个预处理过程称为白化（一些文献中也叫shpering）。举例来说，假设训练数据是图像，由于图像中相邻像素之间具有很强的相关性，如果用原始图像数据作为输入的话，输入是冗余的。白化的目的就是降低输入的冗余性；更正式的说，我们希望通过白化过程使得学习算法的输入具有如下性质：(i)特征之间相关性较低；(ii)所有特征具有相同的方差。

:【原文】：
:2D example

We will first describe whitening using our previous 2D example. We will then describe how this can be combined with smoothing, and finally how to combine this with PCA. 
How can we make our input features uncorrelated with each other? We had already done this when computing <math>x_{rot}^{(i)}=U^Tx^{(i)}</math>. Repeating our previous figure, our plot for <math>x_{rot}</math> was: 


:【初译】：
:2D 的例子

首先我们将通过之前的 2D 例子描述白化。然后描述其与smoothing的结合, 最后讨论如何与PCA结合。
我们如何消除输入特征之间的相关性? 在计算<math>x_{rot}^{(i)}=U^Tx^{(i)}</math>时我们其实已经完成了。回顾之前的图表, 在坐标系中绘出<math>x_{rot}</math>: 

:【一校】：
下面我们先用前文的2D例子描述白化的主要思想，然后分别介绍如何将白化与平滑和PCA相结合。
在前文计算<math>x_{rot}^{(i)}=U^Tx^{(i)}</math>时我们实际上已经消除了输入特征<math>x^{(i)}</math>之间的相关性。得到的新特征<math>x_{rot}</math>的分布如下图所示：

[[File:PCA-rotated.png | 600px]]

:【原文】：
The covariance matrix of this data is given by:

<math>\begin{align}
\begin{bmatrix}
7.29 & 0  \\
0 & 0.69
\end{bmatrix}.
\end{align}</math>

(Note: Technically, many of the
statements in this section about the "covariance" will be true only if the data
has zero mean.  In the rest of this section, we will take this assumption as
implicit in our statements.  However, even if the data's mean isn't exactly zero, 
the intuitions we're presenting here still hold true, and so this isn't something
that you should worry about.)

:【初译】：
数据的协方差矩阵如下: 

(注: 严格地讲, 这部分许多关于“协方差”的陈述仅当数据以0为均值时成立。接下来，我们将这一假设作为隐含条件。然而，即使数据不以0为均值，我们在这里提出的仍然保持正确，因此读者无需担心。) 

:【一校】：
数据的协方差矩阵如下: 

(注: 严格地讲, 这部分许多关于“协方差”的陈述仅当数据均值为0时成立。下文的论述都隐式地假定这一条件成立。不过即使数据均值不为0，下文的说法仍然成立。)