自我学习

Revision as of 07:35, 16 March 2013 (view source)

Kandeng (Talk | contribs)

(→Overview)

← Older edit

Revision as of 08:49, 16 March 2013 (view source)

Kandeng (Talk | contribs)

(→Learning features)

Newer edit →

Line 17:

在一些拥有大量未标注数据和少量的已标注数据的场景中，上述思想可能是最有效的。即使在只有已标注数据的情况下（这时我们通常忽略训练数据的类标号进行特征学习），以上想法也能得到很好的结果。

-

== ~~Learning features~~ ==

+

==特征学习==

-

~~'''[初译]'''：特征学习~~

+

-

~~'''[一审]'''：特征学习~~

+

-

+

我们已经了解到如何使用一个自编码器（autoencoder）从无标注数据中学习特征。具体来说，假定有一个无标注的训练数据集<math>\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}</math>（下标<math>\textstyle u</math>代表“不带类标”）。现在用它们训练一个稀疏自编码器（可能需要首先对这些数据做白化或其它适当的预处理）。

-

~~'''[原文]'''~~

+

-

~~We have already seen how an autoencoder can be used to learn features from~~

+

-

~~unlabeled data. Concretely, suppose we have an unlabeled~~

+

-

~~training set~~ <math>\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}</math>

+

-

~~with~~ <math>\textstyle ~~m_u</math> unlabeled examples. (The subscript "~~u~~" stands for~~

+

-

~~"unlabeled.") We can then train a sparse autoencoder on this data~~

+

-

~~(perhaps with appropriate whitening or other pre-processing):~~

+

-

+

-

~~'''[初译]'''~~

+

-

我们已经了解自编码神经网络（autoencoder）怎么用来从未被标记数据中学习特征。具体来说，假设我们有<math>\textstyle m_u</math>个未被标记的训练集合<math>\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}</math>（下标u代表“未被标记的”）。现在用它们来训练一个稀疏的自编码神经网络。（可以使用合适的白化以及其他预操作）

+

-

+

-

~~'''[一审]'''~~

+

-

我们已经了解到如何使用一个自编码神经网络（autoencoder）来从无类标数据中学习特征。具体来说，假定有一个无类标的训练数据集<math>\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}</math>~~（下标u代表“不带类标”）。现在用它们训练一个稀疏自编码神经网络（可以使用合适的whitening及其他预处理工作）。~~

+

[[File:STL_SparseAE.png|350px]]

-

~~'''[原文]'''~~

+

利用训练得到的模型参数<math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math>，给定任意的输入数据<math>\textstyle x</math>，可以计算隐藏单元的激活量（activations）<math>\textstyle a</math>。如前所述，相比原始输入<math>\textstyle x</math>来说，<math>\textstyle a</math>可能是一个更好的特征描述。下图的神经网络描述了特征（激活量<math>\textstyle a</math>）的计算。

-

~~Having trained the parameters~~ <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math> ~~of this model,~~

+

-

~~given any new input~~ <math>\textstyle x</math>~~, we can now compute the corresponding vector of~~

+

-

~~activations~~ <math>\textstyle a</math> ~~of the hidden units. As we saw previously, this often gives a~~

+

-

~~better representation of the input than the original raw input~~ <math>\textstyle x</math>~~. We can also~~

+

-

~~visualize the algorithm for computing the features/activations~~ <math>\textstyle a</math> ~~as the following~~

+

-

~~neural network:~~

+

-

+

-

~~'''[初译]'''~~

+

-

训练得到的模型参数<math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math>，对于任何新的输入<math>\textstyle x</math>，可以计算隐藏单元对应的activations <math>\textstyle a</math>向量。正如前面看到的，这种方法常能给出比原始输入<math>\textstyle x</math>更好的表达重现。如下的神经网络图可视化地阐释了特征/activations <math>\textstyle a</math>的计算:

+

-

+

-

~~'''[一审]'''~~

+

-

利用训练得到的模型参数<math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math>，给定任意的输入数据<math>\textstyle x</math>，可以计算隐藏单元对应的激活值（activations）向量<math>\textstyle a</math>。如前所述，相比原始输入<math>\textstyle x</math>来说，这样做可以得到一个更好的特征描述。下图的神经网络描述了特征/激活值向量<math>\textstyle a</math>~~的计算。~~

+

[[File:STL_SparseAE_Features.png|300px]]

-

~~'''[原文]'''~~

+

这实际上就是之前得到的稀疏自编码器，在这里去掉了最后一层。

-

~~This is just the sparse autoencoder that we previously had, with with the final layer removed.~~

+

-

~~'''[初译]'''~~

-

~~这是之前得到的移除了最终层次的稀疏自编码神经网络。~~

-

~~'''[一审]'''~~

+

假定有大小为<math>\textstyle m_l</math>的已标注训练集 <math>\textstyle \{ (x_l^{(1)}, y^{(1)}),

-

~~这实际上就是之前得到的稀疏自编码神经网络，在这里去掉了最后一层。~~

+

(x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math>（下标<math>\textstyle l</math>表示“带类标”），我们可以为输入数据找到更好的特征描述。例如，可以将<math>\textstyle x_l^{(1)}</math>输入到稀疏自编码器，得到隐藏单元激活量<math>\textstyle a_l^{(1)}</math>。接下来，可以直接使用<math>\textstyle a_l^{(1)}</math>来代替原始数据<math>\textstyle x_l^{(1)}</math>。也可以合二为一，使用新的向量<math>\textstyle (x_l^{(1)}, a_l^{(1)})</math>来代替原始数据<math>\textstyle x_l^{(1)}</math>。

-

~~'''[原文]'''~~

+

经过变换后，训练集就变成<math>\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)})

-

~~Now, suppose we have a labeled training set <math>\textstyle \{ (x_l^{(1)}, y^{(1)}),~~

+

-

~~(x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math> of <math>\textstyle m_l</math> examples.~~

+

-

~~(The subscript "l" stands for "labeled.")~~

+

-

~~We can now find a better representation for the inputs. In particular, rather~~

+

-

~~than representing the first training example as <math>\textstyle x_l^{(1)}</math>, we can feed~~

+

-

~~<math>\textstyle x_l^{(1)}</math> as the input to our autoencoder, and obtain the corresponding~~

+

-

~~vector of activations <math>\textstyle a_l^{(1)}</math>. To represent this example, we can either~~

+

-

~~just '''replace''' the original feature vector with <math>\textstyle a_l^{(1)}</math>.~~

+

-

~~Alternatively, we can '''concatenate''' the two feature vectors together,~~

+

-

~~getting a representation <math>\textstyle (x_l^{(1)}, a_l^{(1)})</math>.~~

+

-

+

-

~~'''[初译]'''~~

+

-

~~现在，假设有一组大小为<math>\textstyle m_l</math>个的被标记训练集<math>\textstyle \{ (x_l^{(1)}, y^{(1)}),~~

+

-

(x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math>。（下标l代表被“标记的”），我们可以找到更好的输入的重新表达形式。相比起去重现第一个训练样本<math>\textstyle x_l^{(1)}</math>，我们将<math>\textstyle x_l^{(1)}</math>作为自编码神经网络的输入，以此获得对应的activations <math>\textstyle a_l^{(1)}</math>向量。为了重新表达这个样本，用<math>\textstyle a_l^{(1)}</math>来替换原始的特征向量<math>\textstyle x_l^{(1)}</math>。或者，将两个特征向量合并起来，得到重新表达形式<math>\textstyle (x_l^{(1)}, a_l^{(1)})</math>。

+

-

+

-

~~'''[一审]'''~~

+

-

~~一审：假定有大小为<math>\textstyle m_l</math>的带类标训练数据集 <math>\textstyle \{ (x_l^{(1)}, y^{(1)}),~~

+

-

(x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math>（下表l表示“带类标”），对于输入数据，可以找到更好的特征描述。相比原始的数据特征描述，可以将<math>\textstyle x_l^{(1)}</math>输入到稀疏自编码神经网络，得到隐藏单元激活值向量<math>\textstyle a_l^{(1)}</math>。接下来，可以直接使用来代替<math>\textstyle a_l^{(1)}</math>描述原始数据<math>\textstyle x_l^{(1)}</math>。也可以合二为一，使用新的向量<math>\textstyle (x_l^{(1)}, a_l^{(1)})</math>来描述。

+

-

+

-

+

-

~~'''[原文]'''~~

+

-

~~Thus, our training set now becomes~~

+

-

~~<math>\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)})~~

+

-

~~\}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the~~

+

-

~~<math>\textstyle i</math>-th training example), or <math>\textstyle \{~~

+

-

~~((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,~~

+

-

~~((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated~~

+

-

~~representation). In practice, the concatenated representation often works~~

+

-

~~better; but for memory or computation representations, we will sometimes use~~

+

-

~~the replacement representation as well.~~

+

-

+

-

~~'''[初译]'''~~

+

-

~~因此，现在训练集合变成<math>\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)})~~

+

-

\}</math>（使用上述的替换表达形式，同时使用<math>\textstyle a_l^{(i)}</math>来表达第<math>\textstyle i</math>个训练样本）。训练集合也可以表示为 <math>\textstyle \{

+

-

~~((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,~~

+

-

((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> （使用上述的连接表达形式)。在实践中，这种连接表达形式常常有更好的效果。但是，考虑到内存或者计算表达形式，有些时候，需要使用替换的表达形式。

+

-

+

-

~~'''[一审]'''~~

+

-

~~经过变换后，训练数据集就变成~~<math>\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)})

+

\}</math>或者是<math>\textstyle \{

((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,

-

((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math>~~（决定于使用~~<math>\textstyle a_l^{(1)}</math>替换<math>\textstyle x_l^{(1)}</math>还是将二者合并）。在实践中，将<math>\textstyle a_l^{(1)}</math>和<math>\textstyle x_l^{(1)}</math>合并通常表现的更好。但是考虑到内存和计算的成本，也可以使用替换操作。

+

((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math>（取决于使用<math>\textstyle a_l^{(1)}</math>替换<math>\textstyle x_l^{(1)}</math>还是将二者合并）。在实践中，将<math>\textstyle a_l^{(1)}</math>和<math>\textstyle x_l^{(1)}</math>合并通常表现的更好。但是考虑到内存和计算的成本，也可以使用替换操作。

-

+

-

+

-

~~'''[原文]'''~~

+

-

~~Finally, we can train a supervised learning algorithm such as an SVM, logistic~~

+

-

~~regression, etc. to obtain a function that makes predictions on the <math>\textstyle y</math> values.~~

+

-

~~Given a test example <math>\textstyle x_{\rm test}</math>, we would then follow the same procedure:~~

+

-

~~For feed it to the autoencoder to get <math>\textstyle a_{\rm test}</math>. Then, feed~~

+

-

~~either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction.~~

+

-

+

-

~~'''[初译]'''~~

+

-

最终，我们能够使用一个监督学习算法来训练，比如，SVM，logistic回归，等等，来获得对<math>\textstyle y</math>值的预测。对于一个测试样例<math>\textstyle x_{\rm test}</math>，遵守这样的过程：首先，把它送入自编码神经网络得到<math>\textstyle a_{\rm test}</math>。然后，将<math>\textstyle a_{\rm test}</math>或者<math>\textstyle (x_{\rm test}, a_{\rm test})</math>送到分类器得到预测值。

+

-

~~'''[一审]'''~~

-

最终，可以训练出一个有监督学习算法（例如svm,logistic regression等），得到一个判别函数对<math>\textstyle y</math>值就行预测。预测过程如下：给定一个测试样例<math>\textstyle x_{\rm test}</math>,重复之前的过程，送入稀疏自编码神经网络，得到<math>\textstyle a_{\rm test}</math>。然后将<math>\textstyle a_{\rm test}</math>或者（<math>\textstyle (x_{\rm test}, a_{\rm test})</math>）送入训练出的分类器中，得到预测值。

+

最终，可以训练出一个有监督学习算法（例如svm,logistic regression等），得到一个判别函数对<math>\textstyle y</math>值进行预测。预测过程如下：给定一个测试样本<math>\textstyle x_{\rm test}</math>，重复之前的过程，将其送入稀疏自编码器，得到<math>\textstyle a_{\rm test}</math>。然后将<math>\textstyle a_{\rm test}</math>（或者<math>\textstyle (x_{\rm test}, a_{\rm test})</math>）送入分类器中，得到预测值。

== On pre-processing the data ==

From Ufldl

Revision as of 08:49, 16 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 17: / Line 17: @@
 在一些拥有大量未标注数据和少量的已标注数据的场景中，上述思想可能是最有效的。即使在只有已标注数据的情况下（这时我们通常忽略训练数据的类标号进行特征学习），以上想法也能得到很好的结果。
-== Learning features ==
+==特征学习==
-'''[初译]'''：特征学习
-'''[一审]'''：特征学习
+我们已经了解到如何使用一个自编码器（autoencoder）从无标注数据中学习特征。具体来说，假定有一个无标注的训练数据集<math>\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}</math>（下标<math>\textstyle u</math>代表“不带类标”）。现在用它们训练一个稀疏自编码器（可能需要首先对这些数据做白化或其它适当的预处理）。
-'''[原文]'''
-We have already seen how an autoencoder can be used to learn features from
-unlabeled data.  Concretely, suppose we have an unlabeled
-training set <math>\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}</math>
-with <math>\textstyle m_u</math> unlabeled examples.  (The subscript "u" stands for
-"unlabeled.")  We can then train a sparse autoencoder on this data
-(perhaps with appropriate whitening or other pre-processing):
-'''[初译]'''
-我们已经了解自编码神经网络（autoencoder）怎么用来从未被标记数据中学习特征。具体来说，假设我们有<math>\textstyle m_u</math>个未被标记的训练集合<math>\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}</math>（下标u代表“未被标记的”）。现在用它们来训练一个稀疏的自编码神经网络。（可以使用合适的白化以及其他预操作）
-'''[一审]'''
-我们已经了解到如何使用一个自编码神经网络（autoencoder）来从无类标数据中学习特征。具体来说，假定有一个无类标的训练数据集<math>\textstyle \{ x_u^{(1)}, x_u^{(2)}, \ldots, x_u^{(m_u)}\}</math>（下标u代表“不带类标”）。现在用它们训练一个稀疏自编码神经网络（可以使用合适的whitening及其他预处理工作）。
 [[File:STL_SparseAE.png|350px]]
-'''[原文]'''
+利用训练得到的模型参数<math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math>，给定任意的输入数据<math>\textstyle x</math>，可以计算隐藏单元的激活量（activations）<math>\textstyle a</math>。如前所述，相比原始输入<math>\textstyle x</math>来说，<math>\textstyle a</math>可能是一个更好的特征描述。下图的神经网络描述了特征（激活量<math>\textstyle a</math>）的计算。
-Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math> of this model,
-given any new input <math>\textstyle x</math>, we can now compute the corresponding vector of
-activations <math>\textstyle a</math> of the hidden units.  As we saw previously, this often gives a
-better representation of the input than the original raw input <math>\textstyle x</math>.  We can also
-visualize the algorithm for computing the features/activations <math>\textstyle a</math> as the following
-neural network:
-'''[初译]'''
-训练得到的模型参数<math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math>，对于任何新的输入<math>\textstyle x</math>，可以计算隐藏单元对应的activations <math>\textstyle a</math>向量。正如前面看到的，这种方法常能给出比原始输入<math>\textstyle x</math>更好的表达重现。如下的神经网络图可视化地阐释了特征/activations <math>\textstyle a</math>的计算:
-'''[一审]'''
-利用训练得到的模型参数<math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math>，给定任意的输入数据<math>\textstyle x</math>，可以计算隐藏单元对应的激活值（activations）向量<math>\textstyle a</math>。如前所述，相比原始输入<math>\textstyle x</math>来说，这样做可以得到一个更好的特征描述。下图的神经网络描述了特征/激活值向量<math>\textstyle a</math>的计算。
 [[File:STL_SparseAE_Features.png|300px]]
-'''[原文]'''
+这实际上就是之前得到的稀疏自编码器，在这里去掉了最后一层。
-This is just the sparse autoencoder that we previously had, with with the final layer removed.
-'''[初译]'''
-这是之前得到的移除了最终层次的稀疏自编码神经网络。
-'''[一审]'''
+假定有大小为<math>\textstyle m_l</math>的已标注训练集 <math>\textstyle \{ (x_l^{(1)}, y^{(1)}),
-这实际上就是之前得到的稀疏自编码神经网络，在这里去掉了最后一层。
+(x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math>（下标<math>\textstyle l</math>表示“带类标”），我们可以为输入数据找到更好的特征描述。例如，可以将<math>\textstyle x_l^{(1)}</math>输入到稀疏自编码器，得到隐藏单元激活量<math>\textstyle a_l^{(1)}</math>。接下来，可以直接使用<math>\textstyle a_l^{(1)}</math>来代替原始数据<math>\textstyle x_l^{(1)}</math>。也可以合二为一，使用新的向量<math>\textstyle (x_l^{(1)}, a_l^{(1)})</math>来代替原始数据<math>\textstyle x_l^{(1)}</math>。
-'''[原文]'''
+经过变换后，训练集就变成<math>\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)})
-Now, suppose we have a labeled training set <math>\textstyle \{ (x_l^{(1)}, y^{(1)}),
-(x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math> of <math>\textstyle m_l</math> examples.
-(The subscript "l" stands for "labeled.")
-We can now find a better representation for the inputs.  In particular, rather
-than representing the first training example as <math>\textstyle x_l^{(1)}</math>, we can feed
-<math>\textstyle x_l^{(1)}</math> as the input to our autoencoder, and obtain the corresponding
-vector of activations <math>\textstyle a_l^{(1)}</math>.  To represent this example, we can either
-just '''replace''' the original feature vector with <math>\textstyle a_l^{(1)}</math>.
-Alternatively, we can '''concatenate''' the two feature vectors together,
-getting a representation <math>\textstyle (x_l^{(1)}, a_l^{(1)})</math>.
-'''[初译]'''
-现在，假设有一组大小为<math>\textstyle m_l</math>个的被标记训练集<math>\textstyle \{ (x_l^{(1)}, y^{(1)}),
-(x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math>。（下标l代表被“标记的”），我们可以找到更好的输入的重新表达形式。相比起去重现第一个训练样本<math>\textstyle x_l^{(1)}</math>，我们将<math>\textstyle x_l^{(1)}</math>作为自编码神经网络的输入，以此获得对应的activations <math>\textstyle a_l^{(1)}</math>向量。为了重新表达这个样本，用<math>\textstyle a_l^{(1)}</math>来替换原始的特征向量<math>\textstyle x_l^{(1)}</math>。或者，将两个特征向量合并起来，得到重新表达形式<math>\textstyle (x_l^{(1)}, a_l^{(1)})</math>。
-'''[一审]'''
-一审：假定有大小为<math>\textstyle m_l</math>的带类标训练数据集 <math>\textstyle \{ (x_l^{(1)}, y^{(1)}),
-(x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math>（下表l表示“带类标”），对于输入数据，可以找到更好的特征描述。相比原始的数据特征描述 ，可以将<math>\textstyle x_l^{(1)}</math>输入到稀疏自编码神经网络，得到隐藏单元激活值向量<math>\textstyle a_l^{(1)}</math>。接下来，可以直接使用来代替<math>\textstyle a_l^{(1)}</math>描述原始数据<math>\textstyle x_l^{(1)}</math>。也可以合二为一，使用新的向量<math>\textstyle (x_l^{(1)}, a_l^{(1)})</math>来描述。
-'''[原文]'''
-Thus, our training set now becomes
-<math>\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)})
-\}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the
-<math>\textstyle i</math>-th training example), or <math>\textstyle \{
-((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,
-((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated
-representation).  In practice, the concatenated representation often works
-better; but for memory or computation representations, we will sometimes use
-the replacement representation as well.
-'''[初译]'''
-因此，现在训练集合变成<math>\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)})
-\}</math>（使用上述的替换表达形式，同时使用<math>\textstyle a_l^{(i)}</math>来表达第<math>\textstyle i</math>个训练样本）。训练集合也可以表示为 <math>\textstyle \{
-((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,
-((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> （使用上述的连接表达形式)。在实践中，这种连接表达形式常常有更好的效果。但是，考虑到内存或者计算表达形式，有些时候，需要使用替换的表达形式。
-'''[一审]'''
-经过变换后，训练数据集就变成<math>\textstyle \{ (a_l^{(1)}, y^{(1)}), (a_l^{(2)}, y^{(2)}), \ldots (a_l^{(m_l)}, y^{(m_l)})
 \}</math>或者是<math>\textstyle \{
 ((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,
-((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math>（决定于使用<math>\textstyle a_l^{(1)}</math>替换<math>\textstyle x_l^{(1)}</math>还是将二者合并）。在实践中，将<math>\textstyle a_l^{(1)}</math>和<math>\textstyle x_l^{(1)}</math>合并通常表现的更好。但是考虑到内存和计算的成本，也可以使用替换操作。
+((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math>（取决于使用<math>\textstyle a_l^{(1)}</math>替换<math>\textstyle x_l^{(1)}</math>还是将二者合并）。在实践中，将<math>\textstyle a_l^{(1)}</math>和<math>\textstyle x_l^{(1)}</math>合并通常表现的更好。但是考虑到内存和计算的成本，也可以使用替换操作。
-'''[原文]'''
-Finally, we can train a supervised learning algorithm such as an SVM, logistic
-regression, etc. to obtain a function that makes predictions on the <math>\textstyle y</math> values.
-Given a test example <math>\textstyle x_{\rm test}</math>, we would then follow the same procedure:
-For feed it to the autoencoder to get <math>\textstyle a_{\rm test}</math>.  Then, feed
-either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction.
-'''[初译]'''
-最终，我们能够使用一个监督学习算法来训练，比如，SVM，logistic回归，等等，来获得对<math>\textstyle y</math>值的预测。对于一个测试样例<math>\textstyle x_{\rm test}</math>，遵守这样的过程：首先，把它送入自编码神经网络得到<math>\textstyle a_{\rm test}</math>。然后，将<math>\textstyle a_{\rm test}</math>或者<math>\textstyle (x_{\rm test}, a_{\rm test})</math>送到分类器得到预测值。
-'''[一审]'''
-最终，可以训练出一个有监督学习算法（例如svm,logistic regression等），得到一个判别函数对<math>\textstyle y</math>值就行预测。预测过程如下：给定一个测试样例<math>\textstyle x_{\rm test}</math>,重复之前的过程，送入稀疏自编码神经网络，得到<math>\textstyle a_{\rm test}</math>。然后将<math>\textstyle a_{\rm test}</math>或者（<math>\textstyle (x_{\rm test}, a_{\rm test})</math>）送入训练出的分类器中，得到预测值。
+最终，可以训练出一个有监督学习算法（例如svm,logistic regression等），得到一个判别函数对<math>\textstyle y</math>值进行预测。预测过程如下：给定一个测试样本<math>\textstyle x_{\rm test}</math>，重复之前的过程，将其送入稀疏自编码器，得到<math>\textstyle a_{\rm test}</math>。然后将<math>\textstyle a_{\rm test}</math>（或者<math>\textstyle (x_{\rm test}, a_{\rm test})</math>）送入分类器中，得到预测值。
 == On pre-processing the data ==