Softmax Regression

From Ufldl

Jump to: navigation, search
(Mathematical form)
Line 62: Line 62:
With this, we can now find a set of parameters that maximises <math>\ell(\theta)</math>, for instance by using gradient ascent.
With this, we can now find a set of parameters that maximises <math>\ell(\theta)</math>, for instance by using gradient ascent.
 +
 +
=== Weight decay ===
 +
 +
When using softmax in practice, you might find that the weights sometimes balloon up to very large numbers. This can create numerical difficulties and other issues during training or when the trained weights are used in other settings (as in a stacked autoencoder).
 +
 +
Why should the weights balloon up? You can check for yourself that if our current parameters <math>\theta</math> classify the examples perfectly, then multiplying each of the parameters by a large constant increases the log-likelihood of the data under the parameters.
 +
 +
In order to combat this, when using softmax in practice, it may be useful to include a weight decay term to keep the weights small.
 +
 +
The weight decay term takes the form:
 +
 +
<math>
 +
\begin{align}
 +
w(\theta) = \frac{\lambda}{2} \sum_{i}{ \sum_{j}{ \theta_{ij}^2 } }
 +
\end{align}
 +
</math>
 +
 +
This is combined with the log-likelihood function to give a cost function, <math>J(\theta)</math>, which we want to '''minimise''' (observe that we have '''negated the log-likelihood''' so that minimising the cost function maximising the log-likelihood):
 +
 +
<math>
 +
\begin{align}
 +
J(\theta) = -\ell(\theta) + w(\theta)
 +
\end{align}
 +
</math>
 +
 +
The gradients with respect to the cost function must then be adjusted to account for the weight decay term:
 +
 +
<math>
 +
\begin{align}
 +
\frac{\partial J(\theta)}{\partial \theta_k}
 +
 +
&= x^{(i)} ( I_{ \{ y^{(i)} = k\} }  - P(y^{(i)} = k | x^{(i)}) ) + \lambda \theta_k
 +
\end{align}
 +
</math>
 +
 +
Minimising <math>J(\theta)</math> will now maximise the log-likelihood while keeping the weights low.
== Parameters ==
== Parameters ==

Revision as of 05:38, 13 April 2011

Personal tools