Softmax Regression

From Ufldl

Jump to: navigation, search
(Introduction)
Line 160: Line 160:
<math>
<math>
\begin{align}
\begin{align}
-
\nabla_{\theta_j} J(\theta) = -\sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  }
+
\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  }
\end{align}
\end{align}
</math>
</math>
Line 258: Line 258:
<math>
<math>
\begin{align}
\begin{align}
-
\nabla_{\theta_j} J(\theta) = -\sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
+
\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
\end{align}
\end{align}
</math>
</math>
Line 343: Line 343:
-
Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector $\theta'$, we find
+
Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector <math>\theta'</math>, we find
that softmax regression predicts the probability of one of the classes as
that softmax regression predicts the probability of one of the classes as
<math>\frac{1}{ 1  + e^{ (\theta')^T x^{(i)} } }</math>,
<math>\frac{1}{ 1  + e^{ (\theta')^T x^{(i)} } }</math>,
Line 365: Line 365:
then you can set <math>k=5</math> in softmax regression, and also have a fifth, "none of the above," class.)
then you can set <math>k=5</math> in softmax regression, and also have a fifth, "none of the above," class.)
-
If however your categories are has_vocals, dance, sountrack, pop, then the
+
If however your categories are has_vocals, dance, soundtrack, pop, then the
classes are not mutually exclusive; for example, there can be a piece of pop
classes are not mutually exclusive; for example, there can be a piece of pop
-
music that comes from a sountrack and in addition has vocals.  In this case, it
+
music that comes from a soundtrack and in addition has vocals.  In this case, it
would be more appropriate to build 4 binary logistic regression classifiers.  
would be more appropriate to build 4 binary logistic regression classifiers.  
This way, for each new musical piece, your algorithm can separately decide whether
This way, for each new musical piece, your algorithm can separately decide whether

Revision as of 18:17, 10 May 2011

Personal tools