Softmax Regression

@@ Line 27: / Line 27: @@
 \end{bmatrix}
 \end{align}
-</math>,
+</math>
 where <math>\theta_1, \theta_2, \ldots, \theta_n</math> are each <math>k</math>-dimensional column vectors that constitute the parameters of our hypothesis.
@@ Line 42: / Line 42: @@
 &= \ln \theta^T_{y^{(i)}} x^{(i)} - \ln \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }}
 \end{align}
-</math>.
+</math>
-Differentiating with respect to <math>\theta_{k}</math>,
+To find <math>\theta</math> such that <math>\ell(\theta)</math> is maximised, we first find the derivatives of <math>\ell(\theta)</math> with respect to <math>\theta_{k}</math>:
 <math>
@@ Line 52: / Line 52: @@
 &= I_{ \{ y^{(i)} = k\} } x^{(i)} - P(y^{(i)} = k | x^{(i)})
 \end{align}
-</math>.
+</math>
+With this, we can now find a set of parameters that maximises <math>\ell(\theta)</math>, for instance by using gradient ascent.

Revision as of 22:52, 10 April 2011