Softmax Regression

From Ufldl

Jump to: navigation, search
(Cost Function)
Line 172: Line 172:
the partial derivative of <math>J(\theta)</math> with respect to the <math>l</math>-th element of <math>\theta_l</math>.  
the partial derivative of <math>J(\theta)</math> with respect to the <math>l</math>-th element of <math>\theta_l</math>.  
-
Armed with this derivation of the derivative, one can then plug it into an algorithm such as gradient descent, and have it
+
Armed with this formula for the derivative, one can then plug it into an algorithm such as gradient descent, and have it
minimize <math>J(\theta)</math>.  For example, with the standard implementation of gradient descent, on each iteration
minimize <math>J(\theta)</math>.  For example, with the standard implementation of gradient descent, on each iteration
we would perform the update <math>\theta_j := \theta_j - \alpha \nabla_{\theta_j} J(\theta)</math> (for each <math>j=1,\ldots,k</math>).
we would perform the update <math>\theta_j := \theta_j - \alpha \nabla_{\theta_j} J(\theta)</math> (for each <math>j=1,\ldots,k</math>).
Line 178: Line 178:
When implementing softmax regression, we will typically use a modified version of the cost function described above;
When implementing softmax regression, we will typically use a modified version of the cost function described above;
specifically, one that incorporates weight decay.  We describe the motivation and details below.
specifically, one that incorporates weight decay.  We describe the motivation and details below.
-
 
== Properties of softmax regression parameterization ==
== Properties of softmax regression parameterization ==

Revision as of 06:24, 10 May 2011

Personal tools