Softmax Regression

From Ufldl

Jump to: navigation, search
(Properties of softmax regression parameterization)
(Weight Decay)
Line 241: Line 241:
We will modify the cost function by adding a weight decay term  
We will modify the cost function by adding a weight decay term  
-
<math>\frac{\lambda}{2} \sum_{i=1}^k \sum_{j=1}^{n+1} \theta_{ij}^2</math>
+
<math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>
which penalizes large values of the parameters.  Our cost function is now
which penalizes large values of the parameters.  Our cost function is now
<math>
<math>
\begin{align}
\begin{align}
-
J(\theta) = - \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{\theta_j^T x^{(i)}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}  \right]
+
J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}  \right]
-
               + \frac{\lambda}{2} \sum_{i} \sum_{j} \theta_{ij}^2
+
               + \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2
\end{align}
\end{align}
</math>
</math>
Line 257: Line 257:
to converge to the global minimum.
to converge to the global minimum.
-
To implement these optimization algorithms, we also need the derivative of this
+
To apply an optimization algorithm, we also need the derivative of this
new definition of <math>J(\theta)</math>.  One can show that the derivative is:
new definition of <math>J(\theta)</math>.  One can show that the derivative is:
<math>
<math>

Revision as of 18:45, 10 May 2011

Personal tools