Softmax Regression

Revision as of 18:11, 10 May 2011 (view source)

Jngiam (Talk | contribs)

(→Introduction)

← Older edit

Revision as of 18:17, 10 May 2011 (view source)

Jngiam (Talk | contribs)

Newer edit →

Line 160:

<math>

\begin{align}

-

\nabla_{\theta_j} J(\theta) = -\sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] }

+

\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] }

\end{align}

</math>

Line 258:

<math>

\begin{align}

-

\nabla_{\theta_j} J(\theta) = -\sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j

+

\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j

\end{align}

</math>

Line 343:

-

Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector $\theta'$, we find

+

Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector <math>\theta'</math>, we find

that softmax regression predicts the probability of one of the classes as

<math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,

Line 365:

then you can set <math>k=5</math> in softmax regression, and also have a fifth, "none of the above," class.)

-

If however your categories are has_vocals, dance, ~~sountrack~~, pop, then the

+

If however your categories are has_vocals, dance, soundtrack, pop, then the

classes are not mutually exclusive; for example, there can be a piece of pop

-

music that comes from a ~~sountrack~~ and in addition has vocals. In this case, it

+

music that comes from a soundtrack and in addition has vocals. In this case, it

would be more appropriate to build 4 binary logistic regression classifiers.

This way, for each new musical piece, your algorithm can separately decide whether

Softmax Regression

From Ufldl

Revision as of 18:17, 10 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 160: / Line 160: @@
 <math>
 \begin{align}
-\nabla_{\theta_j} J(\theta) = -\sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  }
+\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  }
 \end{align}
 </math>
@@ Line 258: / Line 258: @@
 <math>
 \begin{align}
-\nabla_{\theta_j} J(\theta) = -\sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
+\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
 \end{align}
 </math>
@@ Line 343: / Line 343: @@
-Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector $\theta'$, we find
+Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector <math>\theta'</math>, we find
 that softmax regression predicts the probability of one of the classes as
 <math>\frac{1}{ 1  + e^{ (\theta')^T x^{(i)} } }</math>,
@@ Line 365: / Line 365: @@
 then you can set <math>k=5</math> in softmax regression, and also have a fifth, "none of the above," class.)
-If however your categories are has_vocals, dance, sountrack, pop, then the
+If however your categories are has_vocals, dance, soundtrack, pop, then the
 classes are not mutually exclusive; for example, there can be a piece of pop
-music that comes from a sountrack and in addition has vocals.  In this case, it
+music that comes from a soundtrack and in addition has vocals.  In this case, it
 would be more appropriate to build 4 binary logistic regression classifiers.
 This way, for each new musical piece, your algorithm can separately decide whether