Exercise:Convolution and Pooling

From Ufldl

Jump to: navigation, search
Line 13: Line 13:
You will also need:
You will also need:
-
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]
+
* <tt>sparseAutoencoderLinearCost.m</tt> (and related functions) from [[Exercise:Learning_color_features_with_Sparse_Autoencoders]]
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''
-
=== Part I: Linear decoder on color images ===
+
=== Step 1: Learn color features ===
-
In all the exercise so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time.
+
Learn a set of color features by working through [[Exercise:Learning_color_features_with_Sparse_Autoencoders]], we will be using these features in the next steps. You should learn 400 features and they should look like this:
-
 
+
-
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image.
+
-
 
+
-
=== Step 0: Initialization ===
+
-
 
+
-
In this step, we initialize some parameters used in the exercise.
+
-
 
+
-
=== Step 1: Modify sparseAutoencoderCost.m to use a linear decoder ===
+
-
 
+
-
Copy <tt>sparseAutoencoder.m</tt> to the directory for this exercise and rename it to <tt>sparseAutoencoderLinear.m</tt>. Rename the function <tt>sparseAutoencoderCost</tt> in the file to <tt>sparseAutoencoderLinearCost</tt>, and modify it to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.
+
-
 
+
-
=== Step 2: Learn features on small patches ===
+
-
 
+
-
You will now use your sparse autoencoder to learn features on a set of 100 000 small 8x8 patches sampled from the larger 96x96 STL10 images (The STL10 dataset comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck).
+
-
 
+
-
Code has been provided to load patches sampled from the images. Note that you will need to apply the exact same preprocessing steps to the convolved images as you do to the patches used for training the autoencoder (you have to subtract the same mean image and use the exact same whitening matrix), so using a fixed set of patches means that you can recompute these matrices if necessary. Code to load the sampled patches has already been provided, so no additional changes are required on your part.
+
-
 
+
-
In this step, you will train a sparse autoencoder (with linear decoder) on the sampled patches. The code provided trains your sparse autoencoder for 400 iterations with the default parameters initialized in step 0. This should take around 30 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below.
+
[[File:cnn_Features_Good.png|480px]]
[[File:cnn_Features_Good.png|480px]]
-
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:
 
-
 
-
<table>
 
-
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr>
 
-
</table>
 
-
 
-
=== Part II: Convolution and pooling ===
 
-
=== Step 3: Convolution and pooling ===
+
=== Step 2: Convolution and pooling ===
Now that you have learned features for small patches, you will convolved these learned features with the large images, and pool these convolved features for use in a classifier later.
Now that you have learned features for small patches, you will convolved these learned features with the large images, and pool these convolved features for use in a classifier later.
-
==== Step 3a: Convolution ====
+
==== Step 2a: Convolution ====
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.
Line 150: Line 125:
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.
-
==== Step 3b: Checking ====
+
==== Step 2b: Checking ====
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations for the selected features and patches directly using the sparse autoencoder.  
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations for the selected features and patches directly using the sparse autoencoder.  
-
==== Step 3c: Pooling ====
+
==== Step 2c: Pooling ====
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.
-
=== Step 4: Use pooled features for classification ===
+
=== Step 3: Use pooled features for classification ===
Once you have implemented pooling, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.
Once you have implemented pooling, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.
-
=== Step 5: Test classifier ===
+
=== Step 4: Test classifier ===
Now that you have a trained softmax classifier, you can see how well it performs on the test set. This section contains code that will load the test set (which is a smaller part of the STL10 dataset, specifically, 3200 rescaled 64x64 images from 4 different classes) and obtain the pooled, convolved features for the images using the functions <tt>cnnConvolve</tt> and <tt>cnnPool</tt> which you wrote earlier, as well as the preprocessing matrices <tt>ZCAWhite</tt> and <tt>meanImage</tt> which were computed earlier in preprocessing the training images. These pooled features will then be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.
Now that you have a trained softmax classifier, you can see how well it performs on the test set. This section contains code that will load the test set (which is a smaller part of the STL10 dataset, specifically, 3200 rescaled 64x64 images from 4 different classes) and obtain the pooled, convolved features for the images using the functions <tt>cnnConvolve</tt> and <tt>cnnPool</tt> which you wrote earlier, as well as the preprocessing matrices <tt>ZCAWhite</tt> and <tt>meanImage</tt> which were computed earlier in preprocessing the training images. These pooled features will then be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.

Revision as of 02:20, 22 May 2011

Personal tools