Softmax回归
From Ufldl
(Created page with "Softmax回归(Softmax Regression) '''初译''':@knighterzjy '''一审''':@GuitarFang == Introduction介绍 == == 2 ==") |
(→Introduction介绍) |
||
Line 6: | Line 6: | ||
== Introduction介绍 == | == Introduction介绍 == | ||
+ | '''原文''': | ||
+ | |||
+ | In these notes, we describe the '''Softmax regression''' model. This model generalizes logistic regression to | ||
+ | classification problems where the class label <math>y</math> can take on more than two possible values. | ||
+ | This will be useful for such problems as MNIST digit classification, where the goal is to distinguish between 10 different | ||
+ | numerical digits. Softmax regression is a supervised learning algorithm, but we will later be | ||
+ | using it in conjuction with our deep learning/unsupervised feature learning methods. | ||
+ | |||
+ | |||
+ | '''译文''': | ||
+ | |||
+ | 在本节中,我们介绍Softmax回归模型,该模型是logistic回归模型在多分类问题上的泛化,在多分类问题中,类标签y可以取两个以上的值。 Softmax回归模型可以直接应用于 MNIST 手写数字分类问题等多分类问题。Softmax回归是有监督的,不过我们接下来也会介绍它与深度学习/无监督学习方法的结合。 | ||
+ | (译者注: MNIST 是一个手写数字识别库,由 NYU 的Yann LeCun 等人维护。 http://yann.lecun.com/exdb/mnist/ ) | ||
+ | |||
+ | '''一审''': | ||
+ | |||
+ | 在本章中,我们介绍Softmax回归模型。该模型将logistic回归模型一般化,以用来解决类型标签y的可能取值多于两种的分类问题。Softmax回归模型对于诸如MNIST手写数字分类等问题是十分有用的,该问题的目的是辨识10个不同的单个数字。Softmax回归是一种有监督学习算法,但是我们接下来要将它与我们的深度学习/无监督特征学习方法结合起来使用。 | ||
+ | (译者注:MNIST是一个手写数字识别库,由NYU的Yann LeCun等人维护。http://yann.lecun.com/exdb/mnist/) | ||
+ | |||
+ | '''原文''': | ||
+ | Recall that in logistic regression, we had a training set | ||
+ | <math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math> | ||
+ | of <math>m</math> labeled examples, where the input features are <math>x^{(i)} \in \Re^{n+1}</math>. | ||
+ | (In this set of notes, we will use the notational convention of letting the feature vectors <math>x</math> be | ||
+ | <math>n+1</math> dimensional, with <math>x_0 = 1</math> corresponding to the intercept term.) | ||
+ | With logistic regression, we were in the binary classification setting, so the labels | ||
+ | were <math>y^{(i)} \in \{0,1\}</math>. Our hypothesis took the form: | ||
+ | |||
+ | <math>\begin{align} | ||
+ | h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, | ||
+ | \end{align}</math> | ||
+ | |||
+ | |||
+ | |||
+ | '''译文''': | ||
+ | 回顾一下 logistic 回归,我们的训练集为<math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math> | ||
+ | ,其中 m为样本数,<math>x^{(i)} \in \Re^{n+1}</math>为特征。 | ||
+ | 由于 logistic 回归是针对二分类问题的,因此类标 <math>y^{(i)} \in \{0,1\}</math>。假设函数如下: | ||
+ | |||
+ | '''一审''': | ||
+ | 回想一下在 logistic 回归中,我们拥有一个包含 m 个被标记的样本的训练集 <math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>,其中输入特征值 <math>x^{(i)} \in \Re^{n+1}</math>。(在本章中,我们对出现的符号进行如下约定:特征向量 x 的维度为n+1 ,其中 x0=1对应 截距项 。)因为在Logistic 回归中,我们要解决的是 二元分类 问题,因此 类型标记 <math>y^{(i)} \in \{0,1\}</math>。 估值函数 如下: | ||
+ | <math>\begin{align} | ||
+ | h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, | ||
+ | \end{align}</math> | ||
+ | |||
+ | |||
+ | '''原文''': | ||
+ | and the model parameters <math>\theta</math> were trained to minimize | ||
+ | the cost function | ||
+ | |||
+ | <math> | ||
+ | \begin{align} | ||
+ | J(\theta) = -\frac{1}{m} \left[ \sum_{i=1}^m y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)}) \log (1-h_\theta(x^{(i)})) \right] | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | |||
+ | '''译文''': | ||
+ | |||
+ | |||
+ | |||
+ | '''一审''': | ||
+ | |||
+ | '''原文''': | ||
+ | In the softmax regression setting, we are interested in multi-class | ||
+ | classification (as opposed to only binary classification), and so the label | ||
+ | <math>y</math> can take on <math>k</math> different values, rather than only | ||
+ | two. Thus, in our training set | ||
+ | <math>\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}</math>, | ||
+ | we now have that <math>y^{(i)} \in \{1, 2, \ldots, k\}</math>. (Note that | ||
+ | our convention will be to index the classes starting from 1, rather than from 0.) For example, | ||
+ | in the MNIST digit recognition task, we would have <math>k=10</math> different classes. | ||
+ | |||
+ | '''译文''': | ||
+ | |||
+ | '''一审''': | ||
+ | |||
+ | '''原文''': | ||
+ | Given a test input <math>x</math>, we want our hypothesis to estimate | ||
+ | the probability that <math>p(y=j | x)</math> for each value of <math>j = 1, \ldots, k</math>. | ||
+ | I.e., we want to estimate the probability of the class label taking | ||
+ | on each of the <math>k</math> different possible values. Thus, our hypothesis | ||
+ | will output a <math>k</math> dimensional vector (whose elements sum to 1) giving | ||
+ | us our <math>k</math> estimated probabilities. Concretely, our hypothesis | ||
+ | <math>h_{\theta}(x)</math> takes the form: | ||
+ | |||
+ | <math> | ||
+ | \begin{align} | ||
+ | h_\theta(x^{(i)}) = | ||
+ | \begin{bmatrix} | ||
+ | p(y^{(i)} = 1 | x^{(i)}; \theta) \\ | ||
+ | p(y^{(i)} = 2 | x^{(i)}; \theta) \\ | ||
+ | \vdots \\ | ||
+ | p(y^{(i)} = k | x^{(i)}; \theta) | ||
+ | \end{bmatrix} | ||
+ | = | ||
+ | \frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} } | ||
+ | \begin{bmatrix} | ||
+ | e^{ \theta_1^T x^{(i)} } \\ | ||
+ | e^{ \theta_2^T x^{(i)} } \\ | ||
+ | \vdots \\ | ||
+ | e^{ \theta_k^T x^{(i)} } \\ | ||
+ | \end{bmatrix} | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | '''译文''': | ||
+ | |||
+ | '''一审''': | ||
+ | |||
+ | '''原文''': | ||
+ | Here <math>\theta_1, \theta_2, \ldots, \theta_k \in \Re^{n+1}</math> are the | ||
+ | parameters of our model. | ||
+ | Notice that | ||
+ | the term <math>\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} } </math> | ||
+ | normalizes the distribution, so that it sums to one. | ||
+ | |||
+ | '''译文''': | ||
+ | |||
+ | '''一审''': | ||
+ | |||
+ | '''原文''': | ||
+ | For convenience, we will also write | ||
+ | <math>\theta</math> to denote all the | ||
+ | parameters of our model. When you implement softmax regression, it is usually | ||
+ | convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by | ||
+ | stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that | ||
+ | |||
+ | <math> | ||
+ | \theta = \begin{bmatrix} | ||
+ | \mbox{---} \theta_1^T \mbox{---} \\ | ||
+ | \mbox{---} \theta_2^T \mbox{---} \\ | ||
+ | \vdots \\ | ||
+ | \mbox{---} \theta_k^T \mbox{---} \\ | ||
+ | \end{bmatrix} | ||
+ | </math> | ||
== 2 == | == 2 == |