Exercise:Convolution and Pooling

From Ufldl

Jump to: navigation, search
(Step 4: Use pooled features for classification)
 
Line 1: Line 1:
== Convolution and Pooling ==
== Convolution and Pooling ==
-
This problem set is divided into two parts. In the first part, you will implement a [[Linear Decoders | linear decoder]] to learn features on color images from the STL10 dataset. In the second part, you will use these learned features in convolution and pooling for classifying STL10 images.
+
In this exercise you will use the features you learned on 8x8 patches sampled from images from the STL-10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL-10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL-10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).
-
In the file <tt>cnnExercise.zip</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.
+
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.
-
For this exercise, you will need to modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise, as well as '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.
+
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.
=== Dependencies ===
=== Dependencies ===
The following additional files are required for this exercise:
The following additional files are required for this exercise:
-
* STL10 dataset
+
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]
 +
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]
You will also need:
You will also need:
-
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]
+
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]
 +
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''
-
=== Part I: Linear decoder on color images ===
+
=== Step 1: Load learned features ===
-
In all the exercise so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time.  
+
In this step, you will use the features from  [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that were previously saved. To verify that the features are good, the visualized features should look like the following:
-
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image.  
+
[[File:CNN_Features_Good.png|300px]]
-
=== Step 0: Initialization ===
+
=== Step 2: Implement and test convolution and pooling ===
-
In this step, we initialize some parameters used in the exercise.
+
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL-10 images.
-
=== Step 1: Modify sparseAutoencoderCost.m to use a linear decoder ===
+
==== Step 2a: Implement convolution ====
-
You should modify <tt>sparseAutoencoderCost</tt> in <tt>sparseAutoencoderCost.m</tt> from your earlier exercise to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.
+
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.
-
=== Step 2: Learn features on small patches ===
+
First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; this is as opposed to a ''full'' convolution, which allows the patch to extend outside the image, with the area outside the image assumed to be 0), where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions, which are well optimized.
-
You will now use your sparse autoencoder to learn features on small 8x8 patches sampled from the larger 96x96 STL10 images (The STL10 dataset comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck).  
+
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resulting values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.
-
Code has been provided to load the dataset and sample patches from the images. However, because the dataset is relatively large (about 150 megabytes on disk, and close to 1 gigabyte loaded), we recommend that you load the dataset and sample patches from it, then save the patches so that you will not need to load the entire dataset into memory repeatedly. Furthermore, since you will need to apply the exact same preprocessing steps to the convolved images as you do to the patches used for training the autoencoder (you have to subtract the same mean image and use the exact same whitening matrix), storing the original set of patches means that you can recompute these matrices if necessary. Code to save and load the sampled patches has already been provided, so no additional changes are required on your part.
+
However, there are two important points to note in using <tt>conv2</tt>.  
-
We have also provided a wrapper function, <tt>sparseAutoencoderTrain</tt>, analogous to <tt>softmaxTrain</tt>, which trains a sparse autoencoder on the given parameters and data. This function wraps around the function <tt>sparseAutoencoderCost</tt> that you modified in this exercise, providing a convenient way to train a sparse autoencoder using a single function, which may be useful in future exercises.  
+
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and (color) channel of image - that you want to convolve over.  Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>.  Inside the three nested for-loops, you will perform a <tt>conv2</tt> 2-D convolution, using the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel, and the image matrix for the <tt>imageNum</tt>-th image.  
-
In this step, you will use <tt>sparseAutoencoderTrain</tt> to train a sparse autoencoder on the sampled patches. The code provided trains your sparse autoencoder for 800 iterations with the default parameters initialized in step 0. This should take less than 15 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below.
+
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:
 +
 +
<div style="border:1px solid black; padding: 5px">
-
[[File:cnn_Features_Good.png|480px]]
+
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt>
-
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:
+
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:
-
 
+
-
<table>
+
-
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr>
+
-
</table>
+
-
 
+
-
=== Part II: Convolution and pooling ===
+
-
 
+
-
=== Step 3: Convolution and pooling ===
+
-
 
+
-
Now that you have learned features for small patches, you will convolved these learned features with the large images, and pool these convolved features for use in a classifier later.
+
-
 
+
-
==== Step 3a: Convolution ====
+
-
 
+
-
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.
+
-
 
+
-
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all valid <math>(r, c)</math>, where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. However, this is not very efficient.
+
-
 
+
-
Observe that what we are doing above can be broken down into three small steps. First, we need to compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, we can add b to all the computed values. Finally, we apply the sigmoid function to the resultant values. The first substep still requires a loop, but if instead of using a loop, we use MATLAB's convolution functions which are optimized for such computations, we can speed up the process slightly.
+
-
 
+
-
To use these convolution functions, we will have to convolve the features with the large images one at a time. That is, for every feature, we take the matrix <math>W_f</math>, the weights from the input layer to the fth unit in the hidden layer, and convolve it with the large image.
+
-
 
+
-
''Implementation tip:'' Using <tt>conv2</tt> and <tt>convn</tt>:
+
-
 
+
-
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with, to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <math>X</math> (the large image) and <math>W</math> (the feature) using <tt>conv2(X, W)</tt>, and W is a 3x3 matrix as below:
+
<math>
<math>
Line 79: Line 59:
</math>
</math>
-
If you use <tt>conv2(X, W)</tt>, MATLAB will first "flip" <math>W</math> before convolving <math>W</math> with <math>X</math>, as below:
+
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:
<math>
<math>
Line 97: Line 77:
</math>
</math>
-
If the original layout of <math>W</math> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <math>W</math> before passing it into <tt>conv2</tt>, so that after MATLAB flips <math>W</math> in <tt>conv2</tt>, the layout will be correct. This is also true for the general convolution function <tt>convn</tt>. In general, you can flip the matrix <math>W</math> using the following code snippet, which works for <math>W</math> of any dimension
+
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as shown below:
-
% Flip W for use in conv2 / convn
+
<syntaxhighlight lang="matlab">
-
temp = W(:);
+
% Flip W for use in conv2
-
temp = flipud(temp);
+
W = flipud(fliplr(W));
-
temp = reshape(temp, size(W));
+
</syntaxhighlight>
-
==== Step 3b: Checking ====
+
</div>
-
==== Step 3c: Pooling ====
+
Next, to each of the <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. 
-
Implement pooling in <tt>cnnPool.m</tt>.
+
However, there is one additional complication.  If we had not done any preprocessing of the input patches, you could just follow the procedure as described above, and apply the sigmoid function to obtain the convolved features, and we'd be done. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations. 
 +
 
 +
In particular, you did the following to the patches:
 +
<ol>
 +
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches
 +
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.
 +
</ol>
 +
These same three steps must also be applied to the input image patches.
 +
 
 +
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.
 +
 
 +
==== Step 2b: Check your convolution ====
 +
 
 +
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder.
 +
 
 +
==== Step 2c: Pooling ====
 +
 
 +
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>. You should implement ''mean'' pooling (i.e., averaging over feature responses) for this part.
 +
 
 +
==== Step 2d: Check your pooling ====
 +
 
 +
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.
 +
 
 +
=== Step 3: Convolve and pool with the dataset ===
 +
 
 +
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL-10 dataset to obtain the convolved features for both the training and test sets. You will then pool the convolved features to obtain the pooled features for both training and test sets.  The pooled features for the training set will be used to train your  classifier, which you can then test on the test set.
 +
 
 +
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.
=== Step 4: Use pooled features for classification ===
=== Step 4: Use pooled features for classification ===
 +
 +
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around a few minutes.
=== Step 5: Test classifier ===
=== Step 5: Test classifier ===
 +
 +
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 80%.

Latest revision as of 19:16, 3 June 2011

Personal tools