+ | '''Implementational tip:''' Once you have your backpropagation implementation correctly computing the derivatives (as verified using gradient checking in Step 3), when you are now using it with L-BFGS to optimize <math>J_{\rm sparse}(W,b)</math>, make sure you're not doing gradient-checking on every step. Backpropagation can be used to compute the derivatives of <math>J_{\rm sparse}(W,b)</math> fairly efficiently, and if you were additionally computing the gradient numerically on every step, this would slow down your program significantly. | ||

===Step 5: Visualization=== | ===Step 5: Visualization=== | ||

Our implementation took around 5 minutes to run on a fast computer.

In case you end up needing to try out multiple implementations or | In case you end up needing to try out multiple implementations or | ||

different parameter values, be sure to budget enough time for debugging | different parameter values, be sure to budget enough time for debugging |