Softmax Regression
From Ufldl
(→Properties of softmax regression parameterization) |
(→Weight Decay) |
||
Line 241: | Line 241: | ||
We will modify the cost function by adding a weight decay term | We will modify the cost function by adding a weight decay term | ||
- | <math>\frac{\lambda}{2} \sum_{i=1}^k \sum_{j= | + | <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math> |
which penalizes large values of the parameters. Our cost function is now | which penalizes large values of the parameters. Our cost function is now | ||
<math> | <math> | ||
\begin{align} | \begin{align} | ||
- | J(\theta) = - \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{\theta_j^T x^{(i)}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right] | + | J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right] |
- | + \frac{\lambda}{2} \sum_{i} \sum_{j} \theta_{ij}^2 | + | + \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2 |
\end{align} | \end{align} | ||
</math> | </math> | ||
Line 257: | Line 257: | ||
to converge to the global minimum. | to converge to the global minimum. | ||
- | To | + | To apply an optimization algorithm, we also need the derivative of this |
new definition of <math>J(\theta)</math>. One can show that the derivative is: | new definition of <math>J(\theta)</math>. One can show that the derivative is: | ||
<math> | <math> |