Softmax回归

From Ufldl

Jump to: navigation, search
(Created page with "Softmax回归(Softmax Regression) '''初译''':@knighterzjy '''一审''':@GuitarFang == Introduction介绍 == == 2 ==")
(Introduction介绍)
Line 6: Line 6:
== Introduction介绍 ==
== Introduction介绍 ==
 +
'''原文''':
 +
 +
In these notes, we describe the '''Softmax regression''' model.  This model generalizes logistic regression to
 +
classification problems where the class label <math>y</math> can take on more than two possible values.
 +
This will be useful for such problems as MNIST digit classification, where the goal is to distinguish between 10 different
 +
numerical digits.  Softmax regression is a supervised learning algorithm, but we will later be
 +
using it in conjuction with our deep learning/unsupervised feature learning methods.
 +
 +
 +
'''译文''':
 +
 +
在本节中,我们介绍Softmax回归模型,该模型是logistic回归模型在多分类问题上的泛化,在多分类问题中,类标签y可以取两个以上的值。 Softmax回归模型可以直接应用于 MNIST 手写数字分类问题等多分类问题。Softmax回归是有监督的,不过我们接下来也会介绍它与深度学习/无监督学习方法的结合。
 +
(译者注: MNIST 是一个手写数字识别库,由 NYU 的Yann LeCun 等人维护。 http://yann.lecun.com/exdb/mnist/ )
 +
 +
'''一审''':
 +
 +
在本章中,我们介绍Softmax回归模型。该模型将logistic回归模型一般化,以用来解决类型标签y的可能取值多于两种的分类问题。Softmax回归模型对于诸如MNIST手写数字分类等问题是十分有用的,该问题的目的是辨识10个不同的单个数字。Softmax回归是一种有监督学习算法,但是我们接下来要将它与我们的深度学习/无监督特征学习方法结合起来使用。
 +
(译者注:MNIST是一个手写数字识别库,由NYU的Yann LeCun等人维护。http://yann.lecun.com/exdb/mnist/)
 +
 +
'''原文''':
 +
Recall that in logistic regression, we had a training set
 +
<math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>
 +
of <math>m</math> labeled examples, where the input features are <math>x^{(i)} \in \Re^{n+1}</math>. 
 +
(In this set of notes, we will use the notational convention of letting the feature vectors <math>x</math> be
 +
<math>n+1</math> dimensional, with <math>x_0 = 1</math> corresponding to the intercept term.)
 +
With logistic regression, we were in the binary classification setting, so the labels
 +
were <math>y^{(i)} \in \{0,1\}</math>.  Our hypothesis took the form:
 +
 +
<math>\begin{align}
 +
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
 +
\end{align}</math>
 +
 +
 +
 +
'''译文''':
 +
回顾一下 logistic 回归,我们的训练集为<math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>
 +
,其中 m为样本数,<math>x^{(i)} \in \Re^{n+1}</math>为特征。
 +
由于 logistic 回归是针对二分类问题的,因此类标 <math>y^{(i)} \in \{0,1\}</math>。假设函数如下:
 +
 +
'''一审''':
 +
回想一下在 logistic 回归中,我们拥有一个包含 m 个被标记的样本的训练集 <math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>,其中输入特征值 <math>x^{(i)} \in \Re^{n+1}</math>。(在本章中,我们对出现的符号进行如下约定:特征向量 x 的维度为n+1 ,其中 x0=1对应 截距项 。)因为在Logistic 回归中,我们要解决的是 二元分类 问题,因此 类型标记 <math>y^{(i)} \in \{0,1\}</math>。 估值函数 如下:
 +
<math>\begin{align}
 +
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
 +
\end{align}</math>
 +
 +
 +
'''原文''':
 +
and the model parameters <math>\theta</math> were trained to minimize
 +
the cost function
 +
 +
<math>
 +
\begin{align}
 +
J(\theta) = -\frac{1}{m} \left[ \sum_{i=1}^m y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)}) \log (1-h_\theta(x^{(i)})) \right]
 +
\end{align}
 +
</math>
 +
 +
 +
'''译文''':
 +
 +
 +
 +
'''一审''':
 +
 +
'''原文''':
 +
In the softmax regression setting, we are interested in multi-class
 +
classification (as opposed to only binary classification), and so the label
 +
<math>y</math> can take on <math>k</math> different values, rather than only
 +
two.  Thus, in our training set
 +
<math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>,
 +
we now have that <math>y^{(i)} \in \{1, 2, \ldots, k\}</math>.  (Note that
 +
our convention will be to index the classes starting from 1, rather than from 0.)  For example,
 +
in the MNIST digit recognition task, we would have <math>k=10</math> different classes.
 +
 +
'''译文''':
 +
 +
'''一审''':
 +
 +
'''原文''':
 +
Given a test input <math>x</math>, we want our hypothesis to estimate
 +
the probability that <math>p(y=j | x)</math> for each value of <math>j = 1, \ldots, k</math>.
 +
I.e., we want to estimate the probability of the class label taking
 +
on each of the <math>k</math> different possible values.  Thus, our hypothesis
 +
will output a <math>k</math> dimensional vector (whose elements sum to 1) giving
 +
us our <math>k</math> estimated probabilities.  Concretely, our hypothesis
 +
<math>h_{\theta}(x)</math> takes the form:
 +
 +
<math>
 +
\begin{align}
 +
h_\theta(x^{(i)}) =
 +
\begin{bmatrix}
 +
p(y^{(i)} = 1 | x^{(i)}; \theta) \\
 +
p(y^{(i)} = 2 | x^{(i)}; \theta) \\
 +
\vdots \\
 +
p(y^{(i)} = k | x^{(i)}; \theta)
 +
\end{bmatrix}
 +
=
 +
\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }
 +
\begin{bmatrix}
 +
e^{ \theta_1^T x^{(i)} } \\
 +
e^{ \theta_2^T x^{(i)} } \\
 +
\vdots \\
 +
e^{ \theta_k^T x^{(i)} } \\
 +
\end{bmatrix}
 +
\end{align}
 +
</math>
 +
 +
'''译文''':
 +
 +
'''一审''':
 +
 +
'''原文''':
 +
Here <math>\theta_1, \theta_2, \ldots, \theta_k \in \Re^{n+1}</math> are the
 +
parameters of our model. 
 +
Notice that
 +
the term <math>\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} } </math>
 +
normalizes the distribution, so that it sums to one.
 +
 +
'''译文''':
 +
 +
'''一审''':
 +
 +
'''原文''':
 +
For convenience, we will also write
 +
<math>\theta</math> to denote all the
 +
parameters of our model.  When you implement softmax regression, it is usually
 +
convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by
 +
stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that
 +
 +
<math>
 +
\theta = \begin{bmatrix}
 +
\mbox{---} \theta_1^T \mbox{---} \\
 +
\mbox{---} \theta_2^T \mbox{---} \\
 +
\vdots \\
 +
\mbox{---} \theta_k^T \mbox{---} \\
 +
\end{bmatrix}
 +
</math>
== 2 ==
== 2 ==

Revision as of 08:43, 10 March 2013

Personal tools