Exercise:Sparse Autoencoder
From Ufldl
Line 1: | Line 1: | ||
+ | ==Download Related Reading== | ||
+ | * [http://nlp.stanford.edu/~socherr/sparseAutoencoder_2011new.pdf sparseae_reading.pdf] | ||
+ | * [http://www.stanford.edu/class/cs294a/cs294a_2011-assignment.pdf sparseae_exercise.pdf] | ||
+ | |||
==Sparse autoencoder implementation== | ==Sparse autoencoder implementation== | ||
In this problem set, you will implement the sparse autoencoder | In this problem set, you will implement the sparse autoencoder | ||
algorithm, and show how it discovers that edges are a good | algorithm, and show how it discovers that edges are a good | ||
- | representation for natural images. | + | representation for natural images. (Images provided by |
- | Bruno Olshausen. | + | Bruno Olshausen.) The sparse autoencoder algorithm is described in |
the lecture notes found on the course website. | the lecture notes found on the course website. | ||
- | In the file | + | In the file [http://ufldl.stanford.edu/wiki/resources/sparseae_exercise.zip sparseae_exercise.zip], we have provided some starter code in |
Matlab. You should write your code at the places indicated | Matlab. You should write your code at the places indicated | ||
in the files ("<tt>YOUR CODE HERE</tt>"). You have to complete the following files: | in the files ("<tt>YOUR CODE HERE</tt>"). You have to complete the following files: | ||
Line 15: | Line 19: | ||
Specifically, in this exercise you will implement a sparse autoencoder, | Specifically, in this exercise you will implement a sparse autoencoder, | ||
trained with 8×8 image patches using the L-BFGS optimization algorithm. | trained with 8×8 image patches using the L-BFGS optimization algorithm. | ||
+ | |||
+ | '''A note on the software:''' The provided .zip file includes a subdirectory | ||
+ | <tt>minFunc</tt> with 3rd party software implementing L-BFGS, that | ||
+ | is licensed under a Creative Commons, Attribute, Non-Commercial license. | ||
+ | If you need to use this software for commercial purposes, you can | ||
+ | download and use a different function (fminlbfgs) that can serve the same | ||
+ | purpose, but runs ~3x slower for this exercise (and thus is less recommended). | ||
+ | You can read more about this in the [[Fminlbfgs_Details]] page. | ||
+ | |||
+ | |||
===Step 1: Generate training set=== | ===Step 1: Generate training set=== | ||
Line 22: | Line 36: | ||
an 8×8 image patch from the selected image, and convert the image patch (either | an 8×8 image patch from the selected image, and convert the image patch (either | ||
in row-major order or column-major order; it doesn't matter) into a 64-dimensional | in row-major order or column-major order; it doesn't matter) into a 64-dimensional | ||
- | vector to get a training example <math>x \in \Re^{64} | + | vector to get a training example <math>x \in \Re^{64}.</math> |
Complete the code in <tt>sampleIMAGES.m</tt>. Your code should sample 10000 image | Complete the code in <tt>sampleIMAGES.m</tt>. Your code should sample 10000 image | ||
Line 102: | Line 116: | ||
We will use the L-BFGS algorithm. This is provided to you in a function called | We will use the L-BFGS algorithm. This is provided to you in a function called | ||
- | <tt>minFunc | + | <tt>minFunc</tt> (code provided by Mark Schmidt) included in the starter code. (For the purpose of this |
assignment, you only need to call minFunc with the default parameters. You do | assignment, you only need to call minFunc with the default parameters. You do | ||
not need to know how L-BFGS works.) We have already provided code in <tt>train.m</tt> | not need to know how L-BFGS works.) We have already provided code in <tt>train.m</tt> | ||
Line 121: | Line 135: | ||
should work, but feel free to play with different settings of the parameters as | should work, but feel free to play with different settings of the parameters as | ||
well. | well. | ||
+ | |||
+ | '''Implementational tip:''' Once you have your backpropagation implementation correctly computing the derivatives (as verified using gradient checking in Step 3), when you are now using it with L-BFGS to optimize <math>J_{\rm sparse}(W,b)</math>, make sure you're not doing gradient-checking on every step. Backpropagation can be used to compute the derivatives of <math>J_{\rm sparse}(W,b)</math> fairly efficiently, and if you were additionally computing the gradient numerically on every step, this would slow down your program significantly. | ||
+ | |||
===Step 5: Visualization=== | ===Step 5: Visualization=== | ||
Line 139: | Line 156: | ||
- | Our implementation took around | + | Our implementation took around 5 minutes to run on a fast computer. |
In case you end up needing to try out multiple implementations or | In case you end up needing to try out multiple implementations or | ||
different parameter values, be sure to budget enough time for debugging | different parameter values, be sure to budget enough time for debugging | ||
Line 155: | Line 172: | ||
[[Category:Exercises]] | [[Category:Exercises]] | ||
+ | |||
+ | |||
+ | {{Sparse_Autoencoder}} |