Exercise:Softmax Regression

Revision as of 08:41, 8 May 2011 (view source)

10.30.8.72 (Talk)

(→Softmax regression)

← Older edit

Revision as of 06:59, 10 May 2011 (view source)

Ang (Talk | contribs)

(→Step 2: Implement softmaxCost)

Newer edit →

Line 31:

=== Step 2: Implement softmaxCost ===

-

In <tt>softmaxCost.m</tt>, implement code to compute the softmax cost function~~. Since minFunc minimizes this cost, we consider the '''negative''' of the log-likelihood~~ <math>~~-\ell(\theta)</math>, in order to maximize <math>\ell~~(\theta)</math>. Remember ~~also~~ to include the weight decay term in the cost as well. Your code should also compute the appropriate gradients, as well as the predictions for the input data (which will be used in the cross-validation step later).

+

In <tt>softmaxCost.m</tt>, implement code to compute the softmax cost function <math>J(\theta)</math>. Remember to include the weight decay term in the cost as well. Your code should also compute the appropriate gradients, as well as the predictions for the input data (which will be used in the cross-validation step later).

It is important to vectorize your code so that it runs quickly. We also provide several implementation tips below:

Line 47:

\begin{align}

h(x^{(i)}) =

-

\frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }

+

\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }

\begin{bmatrix}

e^{ \theta_1^T x^{(i)} } \\

e^{ \theta_2^T x^{(i)} } \\

\vdots \\

-

e^{ \~~theta_n~~^T x^{(i)} } \\

+

e^{ \theta_k^T x^{(i)} } \\

\end{bmatrix}

\end{align}

Line 63:

h(x^{(i)}) &=

-

\frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }

+

\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }

\begin{bmatrix}

e^{ \theta_1^T x^{(i)} } \\

e^{ \theta_2^T x^{(i)} } \\

\vdots \\

-

e^{ \~~theta_n~~^T x^{(i)} } \\

+

e^{ \theta_k^T x^{(i)} } \\

\end{bmatrix} \\

&=

-

\frac{ e^{-\alpha} }{ e^{-\alpha} \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }

+

\frac{ e^{-\alpha} }{ e^{-\alpha} \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }

\begin{bmatrix}

e^{ \theta_1^T x^{(i)} } \\

e^{ \theta_2^T x^{(i)} } \\

\vdots \\

-

e^{ \~~theta_n~~^T x^{(i)} } \\

+

e^{ \theta_k^T x^{(i)} } \\

\end{bmatrix} \\

&=

-

\frac{ 1 }{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} - \alpha }} }

+

\frac{ 1 }{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} - \alpha }} }

\begin{bmatrix}

e^{ \theta_1^T x^{(i)} - \alpha } \\

e^{ \theta_2^T x^{(i)} - \alpha } \\

\vdots \\

-

e^{ \~~theta_n~~^T x^{(i)} - \alpha } \\

+

e^{ \theta_k^T x^{(i)} - \alpha } \\

\end{bmatrix} \\

Exercise:Softmax Regression

From Ufldl

Revision as of 06:59, 10 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 31: / Line 31: @@
 === Step 2: Implement softmaxCost ===
-In <tt>softmaxCost.m</tt>, implement code to compute the softmax cost function. Since minFunc minimizes this cost, we consider the '''negative''' of the log-likelihood <math>-\ell(\theta)</math>, in order to maximize <math>\ell(\theta)</math>. Remember also to include the weight decay term in the cost as well. Your code should also compute the appropriate gradients, as well as the predictions for the input data (which will be used in the cross-validation step later).
+In <tt>softmaxCost.m</tt>, implement code to compute the softmax cost function <math>J(\theta)</math>.  Remember to include the weight decay term in the cost as well.  Your code should also compute the appropriate gradients, as well as the predictions for the input data (which will be used in the cross-validation step later).
 It is important to vectorize your code so that it runs quickly. We also provide several implementation tips below:
@@ Line 47: / Line 47: @@
 \begin{align}
 h(x^{(i)}) =
-\frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }
+\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }
 \begin{bmatrix}
 e^{ \theta_1^T x^{(i)} } \\
 e^{ \theta_2^T x^{(i)} } \\
 \vdots \\
-e^{ \theta_n^T x^{(i)} } \\
+e^{ \theta_k^T x^{(i)} } \\
 \end{bmatrix}
 \end{align}
@@ Line 63: / Line 63: @@
 h(x^{(i)}) &=
-\frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }
+\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }
 \begin{bmatrix}
 e^{ \theta_1^T x^{(i)} } \\
 e^{ \theta_2^T x^{(i)} } \\
 \vdots \\
-e^{ \theta_n^T x^{(i)} } \\
+e^{ \theta_k^T x^{(i)} } \\
 \end{bmatrix} \\
 &=
-\frac{ e^{-\alpha} }{ e^{-\alpha} \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }
+\frac{ e^{-\alpha} }{ e^{-\alpha} \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }
 \begin{bmatrix}
 e^{ \theta_1^T x^{(i)} } \\
 e^{ \theta_2^T x^{(i)} } \\
 \vdots \\
-e^{ \theta_n^T x^{(i)} } \\
+e^{ \theta_k^T x^{(i)} } \\
 \end{bmatrix} \\
 &=
-\frac{ 1 }{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} - \alpha }} }
+\frac{ 1 }{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} - \alpha }} }
 \begin{bmatrix}
 e^{ \theta_1^T x^{(i)} - \alpha } \\
 e^{ \theta_2^T x^{(i)} - \alpha } \\
 \vdots \\
-e^{ \theta_n^T x^{(i)} - \alpha } \\
+e^{ \theta_k^T x^{(i)} - \alpha } \\
 \end{bmatrix} \\