Softmax回归

From Ufldl

Jump to: navigation, search

@@ Line 6: / Line 6: @@
 == Introduction介绍 ==
+'''原文''':
+In these notes, we describe the '''Softmax regression''' model.  This model generalizes logistic regression to
+classification problems where the class label <math>y</math> can take on more than two possible values.
+This will be useful for such problems as MNIST digit classification, where the goal is to distinguish between 10 different
+numerical digits.  Softmax regression is a supervised learning algorithm, but we will later be
+using it in conjuction with our deep learning/unsupervised feature learning methods.
+'''译文''':
+在本节中，我们介绍Softmax回归模型，该模型是logistic回归模型在多分类问题上的泛化，在多分类问题中，类标签y可以取两个以上的值。 Softmax回归模型可以直接应用于 MNIST 手写数字分类问题等多分类问题。Softmax回归是有监督的，不过我们接下来也会介绍它与深度学习/无监督学习方法的结合。
+（译者注： MNIST 是一个手写数字识别库，由 NYU 的Yann LeCun 等人维护。 http://yann.lecun.com/exdb/mnist/ ）
+'''一审''':
+在本章中，我们介绍Softmax回归模型。该模型将logistic回归模型一般化，以用来解决类型标签y的可能取值多于两种的分类问题。Softmax回归模型对于诸如MNIST手写数字分类等问题是十分有用的，该问题的目的是辨识10个不同的单个数字。Softmax回归是一种有监督学习算法，但是我们接下来要将它与我们的深度学习/无监督特征学习方法结合起来使用。
+（译者注：MNIST是一个手写数字识别库，由NYU的Yann LeCun等人维护。http://yann.lecun.com/exdb/mnist/）
+'''原文''':
+Recall that in logistic regression, we had a training set
+<math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>
+of <math>m</math> labeled examples, where the input features are <math>x^{(i)} \in \Re^{n+1}</math>.
+(In this set of notes, we will use the notational convention of letting the feature vectors <math>x</math> be
+<math>n+1</math> dimensional, with <math>x_0 = 1</math> corresponding to the intercept term.)
+With logistic regression, we were in the binary classification setting, so the labels
+were <math>y^{(i)} \in \{0,1\}</math>.  Our hypothesis took the form:
+<math>\begin{align}
+h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
+\end{align}</math>
+'''译文''':
+回顾一下 logistic 回归，我们的训练集为<math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>
+，其中 m为样本数，<math>x^{(i)} \in \Re^{n+1}</math>为特征。
+由于 logistic 回归是针对二分类问题的，因此类标 <math>y^{(i)} \in \{0,1\}</math>。假设函数如下：
+'''一审''':
+回想一下在 logistic 回归中，我们拥有一个包含 m 个被标记的样本的训练集 <math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>，其中输入特征值 <math>x^{(i)} \in \Re^{n+1}</math>。（在本章中，我们对出现的符号进行如下约定：特征向量 x 的维度为n+1 ，其中 x0=1对应 截距项 。）因为在Logistic 回归中，我们要解决的是 二元分类 问题，因此 类型标记 <math>y^{(i)} \in \{0,1\}</math>。 估值函数 如下：
+<math>\begin{align}
+h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
+\end{align}</math>
+'''原文''':
+and the model parameters <math>\theta</math> were trained to minimize
+the cost function
+<math>
+\begin{align}
+J(\theta) = -\frac{1}{m} \left[ \sum_{i=1}^m y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)}) \log (1-h_\theta(x^{(i)})) \right]
+\end{align}
+</math>
+'''译文''':
+'''一审''':
+'''原文''':
+In the softmax regression setting, we are interested in multi-class
+classification (as opposed to only binary classification), and so the label
+<math>y</math> can take on <math>k</math> different values, rather than only
+two.  Thus, in our training set
+<math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>,
+we now have that <math>y^{(i)} \in \{1, 2, \ldots, k\}</math>.  (Note that
+our convention will be to index the classes starting from 1, rather than from 0.)  For example,
+in the MNIST digit recognition task, we would have <math>k=10</math> different classes.
+'''译文''':
+'''一审''':
+'''原文''':
+Given a test input <math>x</math>, we want our hypothesis to estimate
+the probability that <math>p(y=j | x)</math> for each value of <math>j = 1, \ldots, k</math>.
+I.e., we want to estimate the probability of the class label taking
+on each of the <math>k</math> different possible values.  Thus, our hypothesis
+will output a <math>k</math> dimensional vector (whose elements sum to 1) giving
+us our <math>k</math> estimated probabilities.  Concretely, our hypothesis
+<math>h_{\theta}(x)</math> takes the form:
+<math>
+\begin{align}
+h_\theta(x^{(i)}) =
+\begin{bmatrix}
+p(y^{(i)} = 1 | x^{(i)}; \theta) \\
+p(y^{(i)} = 2 | x^{(i)}; \theta) \\
+\vdots \\
+p(y^{(i)} = k | x^{(i)}; \theta)
+\end{bmatrix}
+=
+\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }
+\begin{bmatrix}
+e^{ \theta_1^T x^{(i)} } \\
+e^{ \theta_2^T x^{(i)} } \\
+\vdots \\
+e^{ \theta_k^T x^{(i)} } \\
+\end{bmatrix}
+\end{align}
+</math>
+'''译文''':
+'''一审''':
+'''原文''':
+Here <math>\theta_1, \theta_2, \ldots, \theta_k \in \Re^{n+1}</math> are the
+parameters of our model.
+Notice that
+the term <math>\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} } </math>
+normalizes the distribution, so that it sums to one.
+'''译文''':
+'''一审''':
+'''原文''':
+For convenience, we will also write
+<math>\theta</math> to denote all the
+parameters of our model.  When you implement softmax regression, it is usually
+convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by
+stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that
+<math>
+\theta = \begin{bmatrix}
+\mbox{---} \theta_1^T \mbox{---} \\
+\mbox{---} \theta_2^T \mbox{---} \\
+\vdots \\
+\mbox{---} \theta_k^T \mbox{---} \\
+\end{bmatrix}
+</math>
 == 2 ==

Softmax回归

From Ufldl

Revision as of 08:43, 10 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox