Softmax Regression

@@ Line 172: / Line 172: @@
 the partial derivative of <math>J(\theta)</math> with respect to the <math>l</math>-th element of <math>\theta_l</math>.
-Armed with this derivation of the derivative, one can then plug it into an algorithm such as gradient descent, and have it
+Armed with this formula for the derivative, one can then plug it into an algorithm such as gradient descent, and have it
 minimize <math>J(\theta)</math>.  For example, with the standard implementation of gradient descent, on each iteration
 we would perform the update <math>\theta_j := \theta_j - \alpha \nabla_{\theta_j} J(\theta)</math> (for each <math>j=1,\ldots,k</math>).
@@ Line 178: / Line 178: @@
 When implementing softmax regression, we will typically use a modified version of the cost function described above;
 specifically, one that incorporates weight decay.  We describe the motivation and details below.
 == Properties of softmax regression parameterization ==

Revision as of 06:24, 10 May 2011