Softmax Regression
From Ufldl
(→Introduction) |
|||
Line 160: | Line 160: | ||
<math> | <math> | ||
\begin{align} | \begin{align} | ||
- | \nabla_{\theta_j} J(\theta) = -\sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } | + | \nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } |
\end{align} | \end{align} | ||
</math> | </math> | ||
Line 258: | Line 258: | ||
<math> | <math> | ||
\begin{align} | \begin{align} | ||
- | \nabla_{\theta_j} J(\theta) = -\sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j | + | \nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j |
\end{align} | \end{align} | ||
</math> | </math> | ||
Line 343: | Line 343: | ||
- | Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector | + | Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector <math>\theta'</math>, we find |
that softmax regression predicts the probability of one of the classes as | that softmax regression predicts the probability of one of the classes as | ||
<math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>, | <math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>, | ||
Line 365: | Line 365: | ||
then you can set <math>k=5</math> in softmax regression, and also have a fifth, "none of the above," class.) | then you can set <math>k=5</math> in softmax regression, and also have a fifth, "none of the above," class.) | ||
- | If however your categories are has_vocals, dance, | + | If however your categories are has_vocals, dance, soundtrack, pop, then the |
classes are not mutually exclusive; for example, there can be a piece of pop | classes are not mutually exclusive; for example, there can be a piece of pop | ||
- | music that comes from a | + | music that comes from a soundtrack and in addition has vocals. In this case, it |
would be more appropriate to build 4 binary logistic regression classifiers. | would be more appropriate to build 4 binary logistic regression classifiers. | ||
This way, for each new musical piece, your algorithm can separately decide whether | This way, for each new musical piece, your algorithm can separately decide whether |