Softmax Regression
From Ufldl
(→Cost Function) |
|||
Line 172: | Line 172: | ||
the partial derivative of <math>J(\theta)</math> with respect to the <math>l</math>-th element of <math>\theta_l</math>. | the partial derivative of <math>J(\theta)</math> with respect to the <math>l</math>-th element of <math>\theta_l</math>. | ||
- | Armed with this | + | Armed with this formula for the derivative, one can then plug it into an algorithm such as gradient descent, and have it |
minimize <math>J(\theta)</math>. For example, with the standard implementation of gradient descent, on each iteration | minimize <math>J(\theta)</math>. For example, with the standard implementation of gradient descent, on each iteration | ||
we would perform the update <math>\theta_j := \theta_j - \alpha \nabla_{\theta_j} J(\theta)</math> (for each <math>j=1,\ldots,k</math>). | we would perform the update <math>\theta_j := \theta_j - \alpha \nabla_{\theta_j} J(\theta)</math> (for each <math>j=1,\ldots,k</math>). | ||
Line 178: | Line 178: | ||
When implementing softmax regression, we will typically use a modified version of the cost function described above; | When implementing softmax regression, we will typically use a modified version of the cost function described above; | ||
specifically, one that incorporates weight decay. We describe the motivation and details below. | specifically, one that incorporates weight decay. We describe the motivation and details below. | ||
- | |||
== Properties of softmax regression parameterization == | == Properties of softmax regression parameterization == |