从自我学习到深层网络

初译： 新浪微博，@幸福数据挖掘者  http://weibo.com/u/2275505165?topnav=1&wvr=5

一审： 新浪微博，@ztyan http://weibo.com/ztyan

wiki上传： 新浪微博，@幸福数据挖掘者  http://weibo.com/u/2275505165?topnav=1&wvr=5


【原文】
In the previous section, you used an autoencoder to learn features that were then fed as input to a softmax or logistic regression classifier. In that method, the features were learned using only unlabeled data. In this section, we describe how you can '''fine-tune''' and further improve the learned features using labeled data. When you have a large amount of labeled training data, this can significantly improve your classifier's performance.

【初译】
在此前的章节中，我们利用自动编码机来学习输入至softmax或logistic回归分类器的特征。上述方法中的特征仅利用未标注数据学习获得。在本章节中，我们描述了'''微调'''这一方法，即利用标注数据优化学习得到的特征。如果你拥有大量标注数据，可以显著提升分类器的性能。

【一审】
在此前的章节中，我们利用自动编码器来学习输入至softmax回归或logistic回归分类器的特征。上述方法中的特征仅利用未标注数据学习获得。在本章节中，我们描述了'''微调'''这一方法，即利用标注数据优化学习得到的特征。如果你拥有大量标注数据，可以显著提升分类器的性能。


【原文】
In self-taught learning, we first trained a sparse autoencoder on the unlabeled data. Then, given a new example <math>x</math>, we used the hidden layer to extract features <math>a</math>. This is illustrated in the following diagram:

【初译】
在自我学习中，我们首先利用未标注数据训练一个稀疏自动编码机。随后，给定一个新样本<math>x</math>，我们通过隐层提取出特征<math>a</math>。上述过程图示如下：

【一审】
在自我学习中，我们首先利用未标注数据训练一个稀疏自动编码器。随后，给定一个新样本<math>x</math>，我们通过隐含层提取出特征<math>a</math>。上述过程图示如下：


【原文】
We are interested in solving a classification task, where our goal is to predict labels <math>\textstyle y</math>.  We have a labeled training set <math>\textstyle \{ (x_l^{(1)}, y^{(1)}), (x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)},y^{(m_l)}) \}</math> of <math>\textstyle m_l</math> labeled examples. We showed previously that we can replace the original features <math>\textstyle x^{(i)}</math> with features <math>\textstyle a^{(l)}</math> computed by the sparse autoencoder (the "replacement" representation).  This gives us a training set <math>\textstyle \{(a^{(1)},y^{(1)}), \ldots (a^{(m_l)}, y^{(m_l)}) \}</math>.  Finally, we train a logistic classifier to map from the features <math>\textstyle a^{(i)}</math> to the classification label <math>\textstyle y^{(i)}</math>. To illustrate this step, similar to [[Neural Networks|our earlier notes]], we can draw our logistic regression unit (shown in orange) as follows:

【初译】
我们感兴趣的是解决一个分类任务，目标是预测样本类型<math>\textstyle y</math>。我们拥有标注数据集<math>\textstyle \{ (x_l^{(1)}, y^{(1)}), (x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)},y^{(m_l)}) \}</math>，包含<math>\textstyle m_l</math>个标注样本。此前我们已经证明，可以利用稀疏自动编码机计算获得的特征<math>\textstyle a^{(l)}</math> （“替代”表示）来替代初始特征<math>\textstyle x^{(i)}</math>。如此，我们就获得训练数据集<math>\textstyle \{(a^{(1)},y^{(1)}), \ldots (a^{(m_l)}, y^{(m_l)}) \}</math>。最终，我们训练得到一个从特征<math>\textstyle a^{(i)}</math> 到分类标注<math>\textstyle y^{(i)}</math>的logistic分类器。为说明这一过程，如同我们此前的笔记，可以如下图描述logistic回归单元（橘黄色）。

【一审】
我们感兴趣的是解决一个分类任务，目标是预测标注类型<math>\textstyle y</math>。我们拥有标注训练集<math>\textstyle \{ (x_l^{(1)}, y^{(1)}), (x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)},y^{(m_l)}) \}</math>，包含<math>\textstyle m_l</math>个标注样本。此前我们已经证明，可以利用稀疏自动编码器计算获得的特征<math>\textstyle a^{(l)}</math>（“替代”表示）来替代初始特征。如此，我们就获得训练数据集<math>\textstyle \{(a^{(1)},y^{(1)}), \ldots (a^{(m_l)}, y^{(m_l)}) \}</math>。最终，我们训练得到一个从特征<math>\textstyle a^{(i)}</math> 到分类标注<math>\textstyle y^{(i)}</math>的logistic分类器。为说明这一过程，如同我们此前的笔记，可以如下图描述logistic回归单元（橘黄色）。

【