Exercise:Softmax Regression

From Ufldl

Jump to: navigation, search
(Softmax regression)
Line 78: Line 78:
  M = bsxfun(@minus, M, max(M));
  M = bsxfun(@minus, M, max(M));
-
max(M) yields a row vector with each element giving the maximum value in that column. <tt>bsxfun</tt> (short for binary singleton expansion function) applies minus along each row of M, hence subtracting the maximum of each column from every element in the column.  
+
<tt>max(M)</tt> yields a row vector with each element giving the maximum value in that column. <tt>bsxfun</tt> (short for binary singleton expansion function) applies minus along each row of <tt>M</tt>, hence subtracting the maximum of each column from every element in the column.  
You may also find <tt>bsxfun</tt> useful in computing your predictions - if you have a matrix <tt>M</tt> containing the <math>e^{\theta_j^T x^{(i)}}</math> terms, such that <tt>M(r, c)</tt> contains the <math>e^{\theta_r^T x^{(c)}}</math> term, you can use the following code to compute the hypothesis (by diving all elements in each column by their column sum):
You may also find <tt>bsxfun</tt> useful in computing your predictions - if you have a matrix <tt>M</tt> containing the <math>e^{\theta_j^T x^{(i)}}</math> terms, such that <tt>M(r, c)</tt> contains the <math>e^{\theta_r^T x^{(c)}}</math> term, you can use the following code to compute the hypothesis (by diving all elements in each column by their column sum):
Line 89: Line 89:
=== Step 3: Gradient checking ===
=== Step 3: Gradient checking ===
-
Once you have written the softmax cost function, you should check your gradients numerically. In general, whenever implementing any learning algorithm, you should always check your gradients numerically before proceeding to train the model. The norm of the difference between the numerical gradient and your analytical gradient should be small, on the order of <math>10^-9</math>.  
+
Once you have written the softmax cost function, you should check your gradients numerically. In general, whenever implementing any learning algorithm, you should always check your gradients numerically before proceeding to train the model. The norm of the difference between the numerical gradient and your analytical gradient should be small, on the order of <math>10^{-9}</math>.  
'''Implementation tip - faster gradient checking''' - when debugging, you can speed up gradient checking by reducing the number of parameters your model uses. In this case, we have included code for reducing the size of the input data, using the first 8 pixels of the images instead of the full 28x28 images. This code can be used by setting the variable <tt>DEBUG</tt> to true, as described in step 1 of the code.
'''Implementation tip - faster gradient checking''' - when debugging, you can speed up gradient checking by reducing the number of parameters your model uses. In this case, we have included code for reducing the size of the input data, using the first 8 pixels of the images instead of the full 28x28 images. This code can be used by setting the variable <tt>DEBUG</tt> to true, as described in step 1 of the code.

Revision as of 04:11, 11 April 2011

Personal tools