Softmax Regression
From Ufldl
(Created page with "== Introduction == '''Softmax regression''', also known as '''multinomial logistic regression''', is a generalisation of logistic regression to problems where there are more tha...") |
|||
Line 27: | Line 27: | ||
\end{bmatrix} | \end{bmatrix} | ||
\end{align} | \end{align} | ||
- | </math> | + | </math> |
where <math>\theta_1, \theta_2, \ldots, \theta_n</math> are each <math>k</math>-dimensional column vectors that constitute the parameters of our hypothesis. | where <math>\theta_1, \theta_2, \ldots, \theta_n</math> are each <math>k</math>-dimensional column vectors that constitute the parameters of our hypothesis. | ||
Line 42: | Line 42: | ||
&= \ln \theta^T_{y^{(i)}} x^{(i)} - \ln \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} | &= \ln \theta^T_{y^{(i)}} x^{(i)} - \ln \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} | ||
\end{align} | \end{align} | ||
- | </math> | + | </math> |
- | + | To find <math>\theta</math> such that <math>\ell(\theta)</math> is maximised, we first find the derivatives of <math>\ell(\theta)</math> with respect to <math>\theta_{k}</math>: | |
<math> | <math> | ||
Line 52: | Line 52: | ||
&= I_{ \{ y^{(i)} = k\} } x^{(i)} - P(y^{(i)} = k | x^{(i)}) | &= I_{ \{ y^{(i)} = k\} } x^{(i)} - P(y^{(i)} = k | x^{(i)}) | ||
\end{align} | \end{align} | ||
- | </math>. | + | </math> |
+ | |||
+ | With this, we can now find a set of parameters that maximises <math>\ell(\theta)</math>, for instance by using gradient ascent. |