Exercise: Implement deep networks for digit classification

===Overview===

In this exercise, you will use a stacked autoencoder for digit classification. This exercise is very similar to the self-taught learning exercise, in which we trained a digit classifier using a autoencoder layer followed by a softmax layer. The only difference in this exercise is that we will be using two autoencoder layers instead of one.

The code you have already implemented will allow you to stack various layers and perform layer-wise training. However, to perform fine-tuning, you will need to implement back-propogation as well. We will see that fine-tuning significantly improves the model's performance.

In the file <tt>stacked_ae_exercise.zip</tt>, we have provided some starter code [http://ufldl.stanford.edu/wiki/resources/sparseae_exercise.zip]. You will need to edit <tt>stackedAECost.m</tt>. You should also read <tt>stackedAETrain.m</tt> and ensure that you understand the steps.

=== Step 0: Initialize constants and parameters ===

Open <tt>stackedAETrain.m</tt>. In this step, we set meta-parameters to the same values that were used in previous exercise, which should produce reasonable results. You may to modify the meta-parameters if you wish.

=== Step 1: Train the data on the first stacked autoencoder ===

Train the first autoencoder on the training images to obtain its parameters. This step is identical to the corresponding step in the sparse autoencoder and STL assignments, so if you have implemented your <tt>autoencoderCost.m</tt> correctly, this step should run properly without needing any modifications. 

=== Step 2: Train the data on the second stacked autoencoder ===

Run the training set through the first autoencoder to obtain hidden unit activation, then train this data on the second autoencoder. Since this is just an adapted application of a standard autoencoder, it should run identically with the first.

Note: This step assumes that you have changed the method signature of sparseAutoencoderCost from
<tt>function [cost, grad] = sparseAutoencoderCost(...)</tt> to <tt>function [cost, grad, activation] = sparseAutoencoderCost(...)</tt> in the [[Exercise:Self-Taught_Learning|previous assignment]].

=== Step 3: Implement fine-tuning ===

To implement fine tuning, we need to consider all three layers as a single model. Implement <tt>stackedAECost.m</tt> to return the cost, gradient and predictions of the model. The cost function should be as defined as the log likelihood and a gradient decay term. The gradient should be computed using back-propogation as discussed earlier. The predictions should consist of the activations of the output layer of the softmax model.

To help you check that your implementation is correct, you can use the <tt>stackedAECheck.m</tt> script. The first part of the script runs the same input on your combined-model function, and on your separate autoencoder and softmax functions, and checks that they return the same cost and predictions. The second part of the script checks that the numerical gradient of the function is the same as your computed analytic gradient. If these two checks pass, you will have implemented fine-tuning correctly.

'''Note:''' Recall that the cost function is given by:

<math>
\begin{align}
J(\theta) = -\ell(\theta) + w(\theta) \\
w(\theta) = \frac{\lambda}{2} \sum_{i}{ \sum_{j}{ \theta_{ij}^2 } } \\
\ell(\theta) = \theta^T_{y^{(i)}} x^{(i)} - \ln \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }}
\end{align}
</math>

When adding the weight decay term to the cost, only the weights for the topmost (softmax) layer need to be considered. Doing so does not impact the results adversely, but simplifies the implementation significantly.

=== Step 4: Test the model ===
After completing these steps, running the entire script in stackedAETrain.m will perform layer-wise training of the stacked autoencoder, finetune the model, and measure its performance on the test set. If you've done all the steps correctly, you should get an accuracy of about X percent.