Softmax Regression
From Ufldl
(→Introduction) |
(→Weight Regularization) |
||
Line 82: | Line 82: | ||
=== Weight Regularization === | === Weight Regularization === | ||
- | When using softmax regression in practice, it is important to use weight regularization. In particular, if there | + | When using softmax regression in practice, it is important to use weight regularization. In particular, if there exists a linear separator that perfectly classifies all the data points, then the softmax-objective is unbounded (given any <math>\theta</math> that separates the data perfectly, one can always scale <math>\theta</math> to be larger and obtain a better objective value). With weight regularization, one penalizes the weights for being large and thus avoids these degenerate situations. |
Weight regularization is also important as it often results in models that generalize better. In particular, one can view weight regularization as placing a (Gaussian) prior on <math>\theta</math> so as to prefer <math>\theta</math> with smaller values. | Weight regularization is also important as it often results in models that generalize better. In particular, one can view weight regularization as placing a (Gaussian) prior on <math>\theta</math> so as to prefer <math>\theta</math> with smaller values. | ||
Line 112: | Line 112: | ||
Minimizing <math>J(\theta)</math> now performs regularized softmax regression. | Minimizing <math>J(\theta)</math> now performs regularized softmax regression. | ||
- | |||
== Parameterization == | == Parameterization == |