http://deeplearning.stanford.edu/wiki/index.php?title=Special:Contributions/Jngiam&feed=atom&limit=50&target=Jngiam&year=&month=Ufldl - User contributions [en]2021-01-23T05:38:50ZFrom UfldlMediaWiki 1.16.2http://deeplearning.stanford.edu/wiki/index.php/UFLDL_TutorialUFLDL Tutorial2011-10-20T01:28:15Z<p>Jngiam: Protected "UFLDL Tutorial" ([edit=sysop] (indefinite) [move=sysop] (indefinite))</p>
<hr />
<div>'''Description:''' This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning. By working through it, you will also get to implement several feature learning/deep learning algorithms, get to see them work for yourself, and learn how to apply/adapt these ideas to new problems.<br />
<br />
This tutorial assumes a basic knowledge of machine learning (specifically, familiarity with the ideas of supervised learning, logistic regression, gradient descent). If you are not familiar with these ideas, we suggest you go to this [http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning Machine Learning course] and complete<br />
sections II, III, IV (up to Logistic Regression) first. <br />
<br />
<br />
'''Sparse Autoencoder'''<br />
* [[Neural Networks]]<br />
* [[Backpropagation Algorithm]]<br />
* [[Gradient checking and advanced optimization]]<br />
* [[Autoencoders and Sparsity]]<br />
* [[Visualizing a Trained Autoencoder]]<br />
* [[Sparse Autoencoder Notation Summary]] <br />
* [[Exercise:Sparse Autoencoder]]<br />
<br />
<br />
'''Vectorized implementation'''<br />
* [[Vectorization]]<br />
* [[Logistic Regression Vectorization Example]]<br />
* [[Neural Network Vectorization]]<br />
* [[Exercise:Vectorization]]<br />
<br />
<br />
'''Preprocessing: PCA and Whitening'''<br />
* [[PCA]]<br />
* [[Whitening]]<br />
* [[Implementing PCA/Whitening]]<br />
* [[Exercise:PCA in 2D]]<br />
* [[Exercise:PCA and Whitening]]<br />
<br />
<br />
'''Softmax Regression'''<br />
* [[Softmax Regression]]<br />
* [[Exercise:Softmax Regression]]<br />
<br />
<br />
'''Self-Taught Learning and Unsupervised Feature Learning''' <br />
* [[Self-Taught Learning]]<br />
* [[Exercise:Self-Taught Learning]]<br />
<br />
<br />
'''Building Deep Networks for Classification'''<br />
* [[Self-Taught Learning to Deep Networks | From Self-Taught Learning to Deep Networks]]<br />
* [[Deep Networks: Overview]]<br />
* [[Stacked Autoencoders]]<br />
* [[Fine-tuning Stacked AEs]]<br />
* [[Exercise: Implement deep networks for digit classification]]<br />
<br />
<br />
'''Linear Decoders with Autoencoders'''<br />
* [[Linear Decoders]]<br />
* [[Exercise:Learning color features with Sparse Autoencoders]]<br />
<br />
<br />
'''Working with Large Images'''<br />
* [[Feature extraction using convolution]]<br />
* [[Pooling]]<br />
* [[Exercise:Convolution and Pooling]]<br />
<br />
----<br />
'''Note''': The sections above this line are stable. The sections below are still under construction, and may change without notice. Feel free to browse around however, and feedback/suggestions are welcome. <br />
<br />
'''Miscellaneous'''<br />
* [[MATLAB Modules]]<br />
* [[Style Guide]]<br />
* [[Useful Links]]<br />
<br />
'''Miscellaneous Topics'''<br />
* [[Data Preprocessing]]<br />
* [[Deriving gradients using the backpropagation idea]]<br />
<br />
'''Advanced Topics''':<br />
<br />
'''Sparse Coding'''<br />
* [[Sparse Coding]]<br />
* [[Sparse Coding: Autoencoder Interpretation]]<br />
* [[Exercise:Sparse Coding]]<br />
<br />
'''ICA Style Models'''<br />
* [[Independent Component Analysis]]<br />
* [[Exercise:Independent Component Analysis]]<br />
<br />
'''Others'''<br />
* [[Convolutional training]] <br />
* [[Restricted Boltzmann Machines]]<br />
* [[Deep Belief Networks]]<br />
* [[Denoising Autoencoders]]<br />
* [[K-means]]<br />
* [[Spatial pyramids / Multiscale]]<br />
* [[Slow Feature Analysis]]<br />
* [[Tiled Convolution Networks]]<br />
<br />
----<br />
<br />
Material contributed by: Andrew Ng, Jiquan Ngiam, Chuan Yu Foo, Yifan Mai, Caroline Suen</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Main_PageMain Page2011-10-20T01:28:07Z<p>Jngiam: Protected "Main Page" ([edit=sysop] (indefinite) [move=sysop] (indefinite))</p>
<hr />
<div>'''Unsupervised Feature Learning and Deep Learning'''<br />
<br />
You probably want the [[UFLDL Tutorial]]. <br />
<br />
Or maybe you want the [[UFLDL Recommended Readings]].<br />
<br />
<span class="plainlinks">[http://www.oro-compro.com <span style="color:#ffffff;font-weight:normal; text-decoration:none!important; background:none!important; text-decoration:none;">compro oro</span>]</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Main_PageMain Page2011-10-20T01:27:54Z<p>Jngiam: Undo revision 1382 by Ij (talk)</p>
<hr />
<div>'''Unsupervised Feature Learning and Deep Learning'''<br />
<br />
You probably want the [[UFLDL Tutorial]]. <br />
<br />
Or maybe you want the [[UFLDL Recommended Readings]].<br />
<br />
<span class="plainlinks">[http://www.oro-compro.com <span style="color:#ffffff;font-weight:normal; text-decoration:none!important; background:none!important; text-decoration:none;">compro oro</span>]</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/MediaWiki:Copyrightwarning2MediaWiki:Copyrightwarning22011-08-16T17:10:43Z<p>Jngiam: Created page with "Please note that all contributions to {{SITENAME}} may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not..."</p>
<hr />
<div>Please note that all contributions to {{SITENAME}} may be edited, altered, or removed by other contributors.<br />
If you do not want your writing to be edited mercilessly, then do not submit it here.<br /><br />
By submitting text or other materials to this Wiki, you are asserting that, and promising us that, you wrote this yourself, or copied it from a public domain or similar free resource. Further, by submitting text or other materials to this Wiki, in consideration for having your text incorporated into the Wiki and thus potentially having others be exposed to content provided by you--which you acknowledge is valuable consideration--you agree to assign and hereby do assign all copyright, title and interest in these materials to the Stanford authors of this Wiki. Do not submit copyrighted work without permission.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/MediaWiki:CopyrightwarningMediaWiki:Copyrightwarning2011-08-16T17:09:59Z<p>Jngiam: Created page with "Please note that all contributions to {{SITENAME}} are considered to be released under the $2 (see $1 for details). If you do not want your writing to be edited mercilessly and r..."</p>
<hr />
<div>Please note that all contributions to {{SITENAME}} are considered to be released under the $2 (see $1 for details).<br />
If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.<br /><br />
By submitting text or other materials to this Wiki, you are asserting that, and promising us that, you wrote this yourself, or copied it from a public domain or similar free resource. Further, by submitting text or other materials to this Wiki, in consideration for having your text incorporated into the Wiki and thus potentially having others be exposed to content provided by you--which you acknowledge is valuable consideration--you agree to assign and hereby do assign all copyright, title and interest in these materials to the Stanford authors of this Wiki. Do not submit copyrighted work without permission.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Independent_Component_AnalysisIndependent Component Analysis2011-06-19T08:22:54Z<p>Jngiam: /* Introduction */</p>
<hr />
<div>== Introduction ==<br />
<br />
If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).<br />
<br />
Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are '''sparse'''; and secondly, our basis is an '''orthonormal''' basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:<br />
<br />
:<math><br />
J(W) = \lVert Wx \rVert_1 <br />
</math><br />
<br />
This objective function is equivalent to the sparsity penalty on the features <math>s</math> in sparse coding, since <math>Wx</math> is precisely the features that represent the data. Adding in the orthonormality constraint gives us the full optimization problem for independent component analysis:<br />
<br />
:<math><br />
\begin{array}{rcl}<br />
{\rm minimize} & \lVert Wx \rVert_1 \\<br />
{\rm s.t.} & WW^T = I \\<br />
\end{array} <br />
</math><br />
<br />
As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint). <br />
<br />
In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Independent Component Analysis#Orthonormal ICA | Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) .<br />
<br />
== Orthonormal ICA ==<br />
<br />
The orthonormal ICA objective is:<br />
:<math><br />
\begin{array}{rcl}<br />
{\rm minimize} & \lVert Wx \rVert_1 \\<br />
{\rm s.t.} & WW^T = I \\<br />
\end{array} <br />
</math><br />
<br />
Observe that the constraint <math>WW^T = I</math> implies two other constraints. <br />
<br />
Firstly, since we are learning an orthonormal basis, the number of basis vectors we learn must be less than the dimension of the input. In particular, this means that we cannot learn over-complete bases as we usually do in [[Sparse Coding: Autoencoder Interpretation | sparse coding]]. <br />
<br />
Secondly, the data must be [[Whitening | ZCA whitened]] with no regularization (that is, with <math>\epsilon</math> set to 0). ([[TODO]] Why must this be so?)<br />
<br />
Hence, before we even begin to optimize for the orthonormal ICA objective, we must ensure that our data has been '''whitened''', and that we are learning an '''under-complete''' basis. <br />
<br />
Following that, to optimize for the objective, we can use gradient descent, interspersing gradient descent steps with projection steps to enforce the orthonormality constraint. Hence, the procedure will be as follows:<br />
<br />
Repeat until done:<br />
<ol><br />
<li><math>W \leftarrow W - \alpha \nabla_W \lVert Wx \rVert_1</math><br />
<li><math>W \leftarrow \operatorname{proj}_U W</math> where <math>U</math> is the space of matrices satisfying <math>WW^T = I</math><br />
</ol><br />
<br />
In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-\frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).<br />
<br />
== Topographic ICA ==<br />
<br />
Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Independent_Component_AnalysisIndependent Component Analysis2011-06-19T08:22:16Z<p>Jngiam: </p>
<hr />
<div>== Introduction ==<br />
<br />
If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).<br />
<br />
Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are '''sparse'''; and secondly, our basis is an '''orthonormal''' basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:<br />
<br />
:<math><br />
J(W) = \lVert Wx \rVert_1 <br />
</math><br />
<br />
This objective function is equivalent to the sparsity penalty on the features <math>s</math> in sparse coding, since <math>Wx</math> is precisely the features that represent the data. Adding in the orthonormality constraint gives us the full optimization problem for independent component analysis:<br />
<br />
:<math><br />
\begin{array}{rcl}<br />
{\rm minimize} & \lVert Wx \rVert_1 \\<br />
{\rm s.t.} & WW^T = I \\<br />
\end{array} <br />
</math><br />
<br />
As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint). <br />
<br />
In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Independent Component Analysis#Orthonormal ICA | Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) . In other situations, the orthonormality constraint is replaced by a reconstruction cost instead to give [[Independent Component Analysis#Reconstruction ICA | Reconstruction ICA]], which is very similar to [[Sparse Coding: Autoencoder Interpretation | sparse coding]], but no longer strictly finds independent components.<br />
<br />
== Orthonormal ICA ==<br />
<br />
The orthonormal ICA objective is:<br />
:<math><br />
\begin{array}{rcl}<br />
{\rm minimize} & \lVert Wx \rVert_1 \\<br />
{\rm s.t.} & WW^T = I \\<br />
\end{array} <br />
</math><br />
<br />
Observe that the constraint <math>WW^T = I</math> implies two other constraints. <br />
<br />
Firstly, since we are learning an orthonormal basis, the number of basis vectors we learn must be less than the dimension of the input. In particular, this means that we cannot learn over-complete bases as we usually do in [[Sparse Coding: Autoencoder Interpretation | sparse coding]]. <br />
<br />
Secondly, the data must be [[Whitening | ZCA whitened]] with no regularization (that is, with <math>\epsilon</math> set to 0). ([[TODO]] Why must this be so?)<br />
<br />
Hence, before we even begin to optimize for the orthonormal ICA objective, we must ensure that our data has been '''whitened''', and that we are learning an '''under-complete''' basis. <br />
<br />
Following that, to optimize for the objective, we can use gradient descent, interspersing gradient descent steps with projection steps to enforce the orthonormality constraint. Hence, the procedure will be as follows:<br />
<br />
Repeat until done:<br />
<ol><br />
<li><math>W \leftarrow W - \alpha \nabla_W \lVert Wx \rVert_1</math><br />
<li><math>W \leftarrow \operatorname{proj}_U W</math> where <math>U</math> is the space of matrices satisfying <math>WW^T = I</math><br />
</ol><br />
<br />
In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-\frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).<br />
<br />
== Topographic ICA ==<br />
<br />
Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Useful_LinksUseful Links2011-06-09T17:18:01Z<p>Jngiam: </p>
<hr />
<div>[http://www.stat.berkeley.edu/~spector/matlab.pdf Matlab Guide]<br />
<br />
[http://www.mathworks.com/matlabcentral/fx_files/5685/2/matopt.zip Writing Fast MATLAB Code (by Pascal Getreuer)]<br />
<br />
[http://www.psi.toronto.edu/matrix/calculus.html Matrix Calculus Reference]<br />
<br />
[http://www.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf The Matrix Cookbook]<br />
<br />
[http://www.math.duke.edu/~jvb/papers/cnn_tutorial.pdf Notes on Convolutional Neural Networks]</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-06-03T19:16:56Z<p>Jngiam: /* Step 4: Use pooled features for classification */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this exercise you will use the features you learned on 8x8 patches sampled from images from the STL-10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL-10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL-10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, you will use the features from [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that were previously saved. To verify that the features are good, the visualized features should look like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL-10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; this is as opposed to a ''full'' convolution, which allows the patch to extend outside the image, with the area outside the image assumed to be 0), where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions, which are well optimized.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resulting values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.<br />
<br />
However, there are two important points to note in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and (color) channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>. Inside the three nested for-loops, you will perform a <tt>conv2</tt> 2-D convolution, using the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel, and the image matrix for the <tt>imageNum</tt>-th image. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as shown below:<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2<br />
W = flipud(fliplr(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
Next, to each of the <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. <br />
<br />
However, there is one additional complication. If we had not done any preprocessing of the input patches, you could just follow the procedure as described above, and apply the sigmoid function to obtain the convolved features, and we'd be done. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations. <br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the input image patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>. You should implement ''mean'' pooling (i.e., averaging over feature responses) for this part.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL-10 dataset to obtain the convolved features for both the training and test sets. You will then pool the convolved features to obtain the pooled features for both training and test sets. The pooled features for the training set will be used to train your classifier, which you can then test on the test set.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around a few minutes.<br />
<br />
=== Step 5: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 80%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Learning_color_features_with_Sparse_AutoencodersExercise:Learning color features with Sparse Autoencoders2011-06-03T19:15:14Z<p>Jngiam: /* Step 1: Modify your sparse autoencoder to use a linear decoder */</p>
<hr />
<div>== Learning color features with Sparse Autoencoders ==<br />
<br />
In this exercise, you will implement a [[Linear Decoders | linear decoder]] (a sparse autoencoder whose output layer uses a linear activation function). You will then apply it to learn features on color images from the STL-10 dataset. These features will be used in an later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]] for classifying STL-10 images.<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip linear_decoder_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from the [[Exercise:Sparse Autoencoder | sparse autoencoder exercise]].<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stl10_patches_100k.zip Sampled 8x8 patches from the STL-10 dataset (stl10_patches_100k.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip Starter Code (linear_decoder_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]<br />
<br />
''If you have not completed the exercise listed above, we strongly suggest you complete it first.''<br />
<br />
=== Learning from color image patches ===<br />
<br />
In all the exercises so far, you have been working only with grayscale images. In this exercise, you will get to work with RGB color images for the first time. <br />
<br />
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image. <br />
<br />
=== Step 0: Initialization ===<br />
<br />
In this step, we initialize some parameters used in the exercise (see starter code for details).<br />
<br />
=== Step 1: Modify your sparse autoencoder to use a linear decoder ===<br />
<br />
Copy <tt>sparseAutoencoderCost.m</tt> to the directory for this exercise and rename it to <tt>sparseAutoencoderLinear.m</tt>. Rename the function <tt>sparseAutoencoderCost</tt> in the file to <tt>sparseAutoencoderLinearCost</tt>, and modify it to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradients to ensure that they are correct.<br />
<br />
=== Step 2: Learn features on small patches ===<br />
<br />
You will now use your sparse autoencoder to learn features on a set of 100,000 small 8x8 patches sampled from the larger 96x96 STL-10 images (The [http://www.stanford.edu/~acoates//stl10/ STL-10 dataset] comprises 5000 training and 8000 test examples, with each example being a 96x96 labelled color image belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck.) <br />
<br />
The code provided in this step trains your sparse autoencoder for 400 iterations with the default parameters initialized in step 0. This should take around 45 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and "opponent colors," as in the figure below. <br />
<br />
[[File:CNN_Features_Good.png|480px]]<br />
<br />
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might instead get images that look like one of the following:<br />
<br />
<table cellpadding=5px><br />
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr><br />
</table><br />
<br />
The learned features will be saved to <tt>STL10Features.mat</tt>, which will be used in the later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]].</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Implementing_PCA/WhiteningImplementing PCA/Whitening2011-06-01T09:15:32Z<p>Jngiam: Undo revision 994 by 79.142.67.147 (talk)</p>
<hr />
<div>In this section, we summarize the PCA, PCA whitening and ZCA whitening algorithms,<br />
and also describe how you can implement them using efficient linear algebra libraries.<br />
<br />
First, we need to ensure that the data has (approximately) zero-mean. For natural images, we achieve this (approximately) by subtracting the mean value of each image patch.<br />
<br />
We achieve this by computing the mean for each patch and subtracting it for each patch. In Matlab, we can do this by using<br />
<br />
avg = mean(x, 1); % Compute the mean pixel intensity value separately for each patch. <br />
x = x - repmat(avg, size(x, 1), 1);<br />
<br />
Next, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can compute this in one fell swoop as <br />
<br />
sigma = x * x' / size(x, 2);<br />
<br />
(Check the math yourself for correctness.) <br />
Here, we assume that <math>x</math> is a data structure that contains one training example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix). <br />
<br />
Next, PCA computes the eigenvectors of <math>\Sigma</math>. One could do this using the Matlab <tt>eig</tt> function. However, because <math>\Sigma</math> is a symmetric positive semi-definite matrix, it is more numerically reliable to do this using the <tt>svd</tt> function. Concretely, if you implement <br />
<br />
[U,S,V] = svd(sigma);<br />
<br />
then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column, sorted in order from top to bottom eigenvector), and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues (also sorted in decreasing order). The matrix <math>V</math> will be equal to transpose of <math>U</math>, and can be safely ignored.<br />
<br />
(Note: The <tt>svd</tt> function actually computes the singular vectors and singular values of a matrix, which for the special case of a symmetric positive semi-definite matrix---which is all that we're concerned with here---is equal to its eigenvectors and eigenvalues. A full discussion of singular vectors vs. eigenvectors is beyond the scope of these notes.)<br />
<br />
Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows:<br />
<br />
xRot = U' * x; % rotated version of the data. <br />
xTilde = U(:,1:k)' * x; % reduced dimension representation of the data, <br />
% where k is the number of eigenvectors to keep<br />
<br />
This gives your PCA representation of the data in terms of <math>\textstyle \tilde{x} \in \Re^k</math>. <br />
Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized<br />
implementation, and the expressions<br />
above work too for computing <math>x_{\rm rot}</math> and <math>\tilde{x}</math> for your entire training set<br />
all in one go. The resulting <br />
<math>x_{\rm rot}</math> and <math>\tilde{x}</math> will have one column corresponding to each training example. <br />
<br />
To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use <br />
<br />
xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x;<br />
<br />
Since <math>S</math>'s diagonal contains the eigenvalues <math>\textstyle \lambda_i</math>, <br />
this turns out to be a compact way <br />
of computing <math>\textstyle x_{{\rm PCAwhite},i} = \frac{x_{{\rm rot},i} }{\sqrt{\lambda_i}}</math><br />
simultaneously for all <math>\textstyle i</math>. <br />
<br />
Finally, you can also compute the ZCA whitened data <math>\textstyle x_{\rm ZCAwhite}</math> as:<br />
<br />
xZCAwhite = U * diag(1./sqrt(diag(S) + epsilon)) * U' * x;<br />
<br />
<br />
{{PCA}}</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/MediaWiki:SidebarMediaWiki:Sidebar2011-05-27T07:09:19Z<p>Jngiam: </p>
<hr />
<div>* ufldl resources<br />
** UFLDL Tutorial|UFLDL Tutorial<br />
** UFLDL_Recommended_Readings|Recommended Readings<br />
<br />
* wiki<br />
** mainpage|mainpage-description <br />
** recentchanges-url|recentchanges<br />
** randompage-url|randompage<br />
** helppage|help<br />
* SEARCH<br />
* TOOLBOX<br />
* LANGUAGES</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/MediaWiki:SidebarMediaWiki:Sidebar2011-05-27T07:09:03Z<p>Jngiam: </p>
<hr />
<div>* resources<br />
** UFLDL Tutorial|UFLDL Tutorial<br />
** UFLDL_Recommended_Readings|Recommended Readings<br />
<br />
* wiki<br />
** mainpage|mainpage-description <br />
** recentchanges-url|recentchanges<br />
** randompage-url|randompage<br />
** helppage|help<br />
* SEARCH<br />
* TOOLBOX<br />
* LANGUAGES</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/MediaWiki:SidebarMediaWiki:Sidebar2011-05-27T07:08:36Z<p>Jngiam: </p>
<hr />
<div>* navigation<br />
** mainpage|mainpage-description <br />
<br />
** UFLDL Tutorial|UFLDL Tutorial<br />
** UFLDL_Recommended_Readings|Recommended Readings<br />
** recentchanges-url|recentchanges<br />
** randompage-url|randompage<br />
** helppage|help<br />
* SEARCH<br />
* TOOLBOX<br />
* LANGUAGES</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/MediaWiki:SidebarMediaWiki:Sidebar2011-05-27T07:08:17Z<p>Jngiam: </p>
<hr />
<div>* navigation<br />
** mainpage|mainpage-description <br />
** UFLDL Tutorial|UFLDL Tutorial<br />
** UFLDL_Recommended_Readings|Recommended Readings<br />
** recentchanges-url|recentchanges<br />
** randompage-url|randompage<br />
** helppage|help<br />
* SEARCH<br />
* TOOLBOX<br />
* LANGUAGES</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/MediaWiki:SidebarMediaWiki:Sidebar2011-05-27T07:07:34Z<p>Jngiam: </p>
<hr />
<div>* navigation<br />
** UFLDL Tutorial|UFLDL Tutorial<br />
** Recommended Readings<br />
<!-- <br />
** mainpage|mainpage-description <br />
** portal-url|portal<br />
** currentevents-url|currentevents <br />
--><br />
** recentchanges-url|recentchanges<br />
** randompage-url|randompage<br />
** helppage|help<br />
* SEARCH<br />
* TOOLBOX<br />
* LANGUAGES</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/MediaWiki:SidebarMediaWiki:Sidebar2011-05-27T07:07:10Z<p>Jngiam: Created page with "* navigation ** UFLDL Tutorial ** Recommended Readings <!-- ** mainpage|mainpage-description ** portal-url|portal ** currentevents-url|currentevents --> ** recentchanges-url|re..."</p>
<hr />
<div>* navigation<br />
** UFLDL Tutorial<br />
** Recommended Readings<br />
<!-- ** mainpage|mainpage-description <br />
** portal-url|portal<br />
** currentevents-url|currentevents --><br />
** recentchanges-url|recentchanges<br />
** randompage-url|randompage<br />
** helppage|help<br />
* SEARCH<br />
* TOOLBOX<br />
* LANGUAGES</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Template:Sparse_AutoencoderTemplate:Sparse Autoencoder2011-05-26T20:48:40Z<p>Jngiam: </p>
<hr />
<div><div style="text-align: center;font-size:small;background-color: #eeeeee; border-style: solid; border-width: 1px; padding: 5px"><br />
[[Neural Networks]] | [[Backpropagation Algorithm]] | [[Gradient checking and advanced optimization]] | [[Autoencoders and Sparsity]] | [[Visualizing a Trained Autoencoder]] | [[Sparse Autoencoder Notation Summary]] | [[Exercise:Sparse Autoencoder]]<br />
</div></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Template:Sparse_AutoencoderTemplate:Sparse Autoencoder2011-05-26T20:48:33Z<p>Jngiam: </p>
<hr />
<div><div style="text-align: center;font-size:small;background-color: #eeeeee; border-style: solid; border-width: 1px; padding: 5px"><br />
[[Neural Networks]] | [[Backpropagation Algorithm]] | [[Gradient checking and advanced optimization]] | [[Autoencoders and Sparsity]] | [[Visualizing a Trained Autoencoder]] | [[Sparse Autoencoder Notation Summary]] | [[Exercise:Sparse Autoencoder]]<br />
</div><br />
<nowiki>Insert non-formatted text here</nowiki></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Template:Vectorized_ImplementationTemplate:Vectorized Implementation2011-05-26T20:48:19Z<p>Jngiam: </p>
<hr />
<div><div style="text-align: center;font-size:small;background-color: #eeeeee; border-style: solid; border-width: 1px; padding: 5px"><br />
[[Vectorization]] | [[Logistic Regression Vectorization Example]] | [[Neural Network Vectorization]] | [[Exercise:Vectorization]]<br />
</div></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Template:PCATemplate:PCA2011-05-26T20:48:05Z<p>Jngiam: </p>
<hr />
<div><div style="text-align: center;font-size:small;background-color: #eeeeee; border-style: solid; border-width: 1px; padding: 5px"><br />
[[PCA]] | [[Whitening]] | [[Implementing PCA/Whitening]] | [[Exercise:PCA in 2D]] | [[Exercise:PCA and Whitening]]<br />
</div></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Template:STLTemplate:STL2011-05-26T20:47:53Z<p>Jngiam: </p>
<hr />
<div><div style="text-align: center;font-size:small;background-color: #eeeeee; border-style: solid; border-width: 1px; padding: 5px"><br />
[[Self-Taught Learning]] | [[Exercise:Self-Taught Learning]]<br />
</div></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Template:CNNTemplate:CNN2011-05-26T20:47:32Z<p>Jngiam: </p>
<hr />
<div><div style="text-align: center;font-size:small;background-color: #eeeeee; border-style: solid; border-width: 1px; padding: 5px"><br />
[[Self-Taught Learning to Deep Networks | From Self-Taught Learning to Deep Networks]] | [[Deep Networks: Overview]] | [[Stacked Autoencoders]] | [[Fine-tuning Stacked AEs]] | [[Exercise: Implement deep networks for digit classification]]<br />
</div></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-26T01:15:00Z<p>Jngiam: /* Step 2c: Pooling */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, you will use the features from [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that was previously saved. To verify that the features are good, the visualized features should look like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions which are well optimized.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.<br />
<br />
However, there are two important points to note in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as shown below:<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2<br />
W = flipud(fliplr(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had not done any preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>. You should implement ''mean'' pooling (i.e., averaging over feature responses) for this part.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 80%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T20:32:01Z<p>Jngiam: /* Step 2a: Implement convolution */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, you will use the features from [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that was previously saved. To verify that the features are good, the visualized features should look like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions which are well optimized.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.<br />
<br />
However, there are two important points to note in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as shown below:<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2<br />
W = flipud(fliplr(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had not done any preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 80%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T17:48:39Z<p>Jngiam: /* Step 2a: Implement convolution */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, you will use the features from [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that was previously saved. To verify that the features are good, the visualized features should look like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions which are well optimized.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.<br />
<br />
However, there are two important points to note in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had not done any preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 80%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T17:45:29Z<p>Jngiam: /* Step 1: Load learned features */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, you will use the features from [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that was previously saved. To verify that the features are good, the visualized features should look like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 80%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T17:42:33Z<p>Jngiam: /* Step 4: Test classifier */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, we will load the color features you learned in [[Exercise:Learning color features with Sparse Autoencoders]]. To verify that the features are correct, the loaded features will be visualized, and you should get something like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 80%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T06:48:27Z<p>Jngiam: /* Step 2a: Implement convolution */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, we will load the color features you learned in [[Exercise:Learning color features with Sparse Autoencoders]]. To verify that the features are correct, the loaded features will be visualized, and you should get something like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Learning_color_features_with_Sparse_AutoencodersExercise:Learning color features with Sparse Autoencoders2011-05-23T05:39:17Z<p>Jngiam: /* Step 2: Learn features on small patches */</p>
<hr />
<div>== Learning color features with Sparse Autoencoders ==<br />
<br />
In this exercise, you will implement a [[Linear Decoders | linear decoder]] (with the sparse autoencoder) to learn features on color images from the STL10 dataset. These features will be used in an later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]] for classifying STL10 images.<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip linear_decoder_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from the [[Exercise:Sparse Autoencoder | sparse autoencoder exercise]].<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stl10_patches_100k.zip Sampled 8x8 patches from the STL10 dataset (stl10_patches_100k.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip Starter Code (linear_decoder_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]<br />
<br />
''If you have not completed the exercise listed above, we strongly suggest you complete it first.''<br />
<br />
=== Learning from color image patches ===<br />
<br />
In all the exercises so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time. <br />
<br />
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image. <br />
<br />
=== Step 0: Initialization ===<br />
<br />
In this step, we initialize some parameters used in the exercise.<br />
<br />
=== Step 1: Modify your sparse autoencoder to use a linear decoder ===<br />
<br />
Copy <tt>sparseAutoencoder.m</tt> to the directory for this exercise and rename it to <tt>sparseAutoencoderLinear.m</tt>. Rename the function <tt>sparseAutoencoderCost</tt> in the file to <tt>sparseAutoencoderLinearCost</tt>, and modify it to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.<br />
<br />
=== Step 2: Learn features on small patches ===<br />
<br />
You will now use your sparse autoencoder to learn features on a set of 100 000 small 8x8 patches sampled from the larger 96x96 STL10 images (The [http://www.stanford.edu/~acoates//stl10/ STL10 dataset] comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck). <br />
<br />
The code provided in this step trains your sparse autoencoder for 400 iterations with the default parameters initialized in step 0. This should take around 45 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below. <br />
<br />
[[File:CNN_Features_Good.png|480px]]<br />
<br />
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:<br />
<br />
<table cellpadding=5px><br />
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr><br />
</table><br />
<br />
The learned features will be saved to <tt>STL10Features.mat</tt>, which will be used in the later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]].</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/File:Cnn_Features_Bad2.pngFile:Cnn Features Bad2.png2011-05-23T05:36:42Z<p>Jngiam: uploaded a new version of &quot;File:Cnn Features Bad2.png&quot;</p>
<hr />
<div></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/File:Cnn_Features_Bad1.pngFile:Cnn Features Bad1.png2011-05-23T04:41:53Z<p>Jngiam: uploaded a new version of &quot;File:Cnn Features Bad1.png&quot;</p>
<hr />
<div></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T02:00:00Z<p>Jngiam: /* Step 2a: Implement convolution */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, we will load the color features you learned in [[Exercise:Learning color features with Sparse Autoencoders]]. To verify that the features are correct, the loaded features will be visualized, and you should get something like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. This is explained in greater detail in the implementation tip section following the code.<br />
<br />
Concretely, the code to do the convolution using <tt>conv2</tt> will look something like the following:<br />
<br />
<syntaxhighlight lang="matlab"><br />
convolvedFeatures = zeros(hiddenSize, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for imageNum = 1:numImages<br />
for featureNum = 1:hiddenSize<br />
% Obtain the feature matrix for this feature<br />
Wfeat = W(featureNum, :);<br />
Wfeat = reshape(Wfeat, patchDim, patchDim, 3);<br />
<br />
% Get convolution of image with feature matrix for each channel<br />
convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for channel = 1:3<br />
<br />
% Flip the feature matrix because of the definition of convolution, as explained later<br />
feature = flipud(fliplr(squeeze(Wfeat(:, :, channel))));<br />
im = squeeze(images(:, :, channel, imageNum));<br />
<br />
% Convolve "filter" with "im", adding the result to convolvedImage<br />
% ---- YOUR CODE HERE ----<br />
<br />
<br />
% ------------------------<br />
<br />
end<br />
<br />
% The convolved feature is the sum of the convolved values for all channels<br />
convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;<br />
end<br />
end<br />
</syntaxhighlight><br />
<br />
The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T01:59:22Z<p>Jngiam: /* Step 2a: Implement convolution */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, we will load the color features you learned in [[Exercise:Learning color features with Sparse Autoencoders]]. To verify that the features are correct, the loaded features will be visualized, and you should get something like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. This is explained in greater detail in the implementation tip section following the code.<br />
<br />
Concretely, the code to do the convolution using <tt>conv2</tt> will look something like the following:<br />
<br />
<syntaxhighlight lang="matlab"><br />
convolvedFeatures = zeros(hiddenSize, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for imageNum = 1:numImages<br />
for featureNum = 1:hiddenSize<br />
% Obtain the feature matrix for this feature<br />
Wfeat = W(featureNum, :);<br />
Wfeat = reshape(Wfeat, patchDim, patchDim, 3);<br />
<br />
% Get convolution of image with feature matrix for each channel<br />
convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for channel = 1:3<br />
<br />
% Flip the feature matrix because of the definition of convolution, as explained later<br />
feature = flipud(fliplr(squeeze(Wfeat(:, :, channel))));<br />
im = squeeze(images(:, :, channel, imageNum));<br />
<br />
% Convolve "filter" with "im", adding the result<br />
convolvedImage = convolvedImage + conv2(im, feature), 'valid');<br />
<br />
end<br />
<br />
% The convolved feature is the sum of the convolved values for all channels<br />
convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;<br />
end<br />
end<br />
</syntaxhighlight><br />
<br />
The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T01:57:23Z<p>Jngiam: /* Step 2a: Implement convolution */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, we will load the color features you learned in [[Exercise:Learning color features with Sparse Autoencoders]]. To verify that the features are correct, the loaded features will be visualized, and you should get something like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. This is explained in greater detail in the implementation tip section following the code.<br />
<br />
Concretely, the code to do the convolution using <tt>conv2</tt> will look something like the following:<br />
<br />
<syntaxhighlight lang="matlab"><br />
convolvedFeatures = zeros(hiddenSize, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for imageNum = 1:numImages<br />
for featureNum = 1:hiddenSize<br />
% Obtain the feature matrix for this feature<br />
Wfeat = W(featureNum, :);<br />
Wfeat = reshape(Wfeat, patchDim, patchDim, 3);<br />
<br />
% Get convolution of image with feature matrix for each channel<br />
convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for channel = 1:3<br />
<br />
% Flip the feature matrix because of the definition of convolution, as explained later<br />
filter = flipud(fliplr(squeeze(Wfeat(:, :, channel))));<br />
im = squeeze(images(:, :, channel, imageNum));<br />
<br />
% Convolve "filter" with "im", adding the result<br />
convolvedImage = convolvedImage + conv2(im, filter), 'valid');<br />
<br />
end<br />
<br />
% The convolved feature is the sum of the convolved values for all channels<br />
convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage;<br />
end<br />
end<br />
</syntaxhighlight><br />
<br />
The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T01:51:07Z<p>Jngiam: /* Step 2a: Implement convolution */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, we will load the color features you learned in [[Exercise:Learning color features with Sparse Autoencoders]]. To verify that the features are correct, the loaded features will be visualized, and you should get something like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. This is explained in greater detail in the implementation tip section following the code.<br />
<br />
Concretely, the code to do the convolution using <tt>conv2</tt> will look something like the following:<br />
<br />
<syntaxhighlight lang="matlab"><br />
convolvedFeatures = zeros(hiddenSize, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for imageNum = 1:numImages<br />
for featureNum = 1:hiddenSize<br />
% Obtain the feature matrix for this feature<br />
Wt = W(featureNum, :);<br />
Wt = reshape(Wt, patchDim, patchDim, 3);<br />
<br />
% Get convolution of image with feature matrix for each channel<br />
convolvedTemp = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1, 3);<br />
for channel = 1:3<br />
% Flip the feature matrix because of the definition of convolution, as explained later<br />
Wt(:, :, channel) = flipud(fliplr(squeeze(Wt(:, :, channel)))); <br />
convolvedTemp(:, :, channel) = conv2(squeeze(images(:, :, channel, imageNum)), squeeze(Wt(:, :, channel)), 'valid');<br />
end<br />
<br />
% The convolved feature is the sum of the convolved values for all channels<br />
convolvedFeatures(featureNum, imageNum, :, :) = sum(convolvedTemp, 3);<br />
end<br />
end<br />
</syntaxhighlight><br />
<br />
The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Learning_color_features_with_Sparse_AutoencodersExercise:Learning color features with Sparse Autoencoders2011-05-23T01:46:09Z<p>Jngiam: /* Dependencies */</p>
<hr />
<div>== Learning color features with Sparse Autoencoders ==<br />
<br />
In this exercise, you will implement a [[Linear Decoders | linear decoder]] (with the sparse autoencoder) to learn features on color images from the STL10 dataset. These features will be used in an later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]] for classifying STL10 images.<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip linear_decoder_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from the [[Exercise:Sparse Autoencoder | sparse autoencoder exercise]].<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stl10_patches_100k.zip Sampled 8x8 patches from the STL10 dataset (stl10_patches_100k.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip Starter Code (linear_decoder_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]<br />
<br />
''If you have not completed the exercise listed above, we strongly suggest you complete it first.''<br />
<br />
=== Learning from color image patches ===<br />
<br />
In all the exercises so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time. <br />
<br />
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image. <br />
<br />
=== Step 0: Initialization ===<br />
<br />
In this step, we initialize some parameters used in the exercise.<br />
<br />
=== Step 1: Modify your sparse autoencoder to use a linear decoder ===<br />
<br />
Copy <tt>sparseAutoencoder.m</tt> to the directory for this exercise and rename it to <tt>sparseAutoencoderLinear.m</tt>. Rename the function <tt>sparseAutoencoderCost</tt> in the file to <tt>sparseAutoencoderLinearCost</tt>, and modify it to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.<br />
<br />
=== Step 2: Learn features on small patches ===<br />
<br />
You will now use your sparse autoencoder to learn features on a set of 100 000 small 8x8 patches sampled from the larger 96x96 STL10 images (The [http://www.stanford.edu/~acoates//stl10/ STL10 dataset] comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck). <br />
<br />
The code provided in this step trains your sparse autoencoder for 400 iterations with the default parameters initialized in step 0. This should take around 45 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below. <br />
<br />
[[File:CNN_Features_Good.png|480px]]<br />
<br />
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:<br />
<br />
<table><br />
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr><br />
</table><br />
<br />
The learned features will be saved to <tt>STL10Features.mat</tt>, which will be used in the later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]].</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-23T01:45:20Z<p>Jngiam: /* Dependencies */</p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>'''.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stlSubset.zip A subset of the STL10 Dataset (stlSubset.zip)]<br />
* [http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip Starter Code (cnn_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinear.m</tt> or your saved features from [[Exercise:Learning color features with Sparse Autoencoders]]<br />
* <tt>feedForwardAutoencoder.m</tt> (and related functions) from [[Exercise:Self-Taught Learning]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Load learned features ===<br />
<br />
In this step, we will load the color features you learned in [[Exercise:Learning color features with Sparse Autoencoders]]. To verify that the features are correct, the loaded features will be visualized, and you should get something like the following:<br />
<br />
[[File:CNN_Features_Good.png|300px]]<br />
<br />
=== Step 2: Implement and test convolution and pooling ===<br />
<br />
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.<br />
<br />
==== Step 2a: Implement convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. This is explained in greater detail in the implementation tip section following the code.<br />
<br />
Concretely, the code to do the convolution using <tt>conv2</tt> will look something like the following:<br />
<br />
<syntaxhighlight lang="matlab"><br />
convolvedFeatures = zeros(hiddenSize, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for imageNum = 1:numImages<br />
for featureNum = 1:hiddenSize<br />
% Obtain the feature matrix for this feature<br />
Wt = W(featureNum, :);<br />
Wt = reshape(Wt, patchDim, patchDim, 3);<br />
<br />
% Get convolution of image with feature matrix for each channel<br />
convolvedTemp = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1, 3);<br />
for channel = 1:3<br />
% Flip the feature matrix because of the definition of convolution, as explained later<br />
Wt(:, :, channel) = flipud(fliplr(squeeze(Wt(:, :, channel)))); <br />
convolvedTemp(:, :, channel) = conv2(squeeze(images(:, :, channel, imageNum)), squeeze(Wt(:, :, channel)), 'valid');<br />
end<br />
<br />
% The convolved feature is the sum of the convolved values for all channels<br />
convolvedFeatures(featureNum, imageNum, :, :) = sum(convolvedTemp, 3);<br />
end<br />
end<br />
</syntaxhighlight><br />
<br />
The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> divide by 255 to normalize them into the range <math>[0, 1]</math><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Check your convolution ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using <tt>feedForwardAutoencoder</tt> for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
==== Step 2d: Check your pooling ====<br />
<br />
We have provided some code for you to check that you have done the pooling correctly. The code runs <tt>cnnPool</tt> against a test matrix to see if it produces the expected result.<br />
<br />
=== Step 3: Convolve and pool with the dataset ===<br />
<br />
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.<br />
<br />
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.<br />
<br />
=== Step 4: Use pooled features for classification ===<br />
<br />
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Learning_color_features_with_Sparse_AutoencodersExercise:Learning color features with Sparse Autoencoders2011-05-23T01:43:12Z<p>Jngiam: /* Dependencies */</p>
<hr />
<div>== Learning color features with Sparse Autoencoders ==<br />
<br />
In this exercise, you will implement a [[Linear Decoders | linear decoder]] (with the sparse autoencoder) to learn features on color images from the STL10 dataset. These features will be used in an later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]] for classifying STL10 images.<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip linear_decoder_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from the [[Exercise:Sparse Autoencoder | sparse autoencoder exercise]].<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stl10_patches_100k.zip STL10 patches]<br />
* [http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip Starter Code (linear_decoder_exercise.zip)]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]<br />
<br />
''If you have not completed the exercise listed above, we strongly suggest you complete it first.''<br />
<br />
=== Learning from color image patches ===<br />
<br />
In all the exercises so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time. <br />
<br />
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image. <br />
<br />
=== Step 0: Initialization ===<br />
<br />
In this step, we initialize some parameters used in the exercise.<br />
<br />
=== Step 1: Modify your sparse autoencoder to use a linear decoder ===<br />
<br />
Copy <tt>sparseAutoencoder.m</tt> to the directory for this exercise and rename it to <tt>sparseAutoencoderLinear.m</tt>. Rename the function <tt>sparseAutoencoderCost</tt> in the file to <tt>sparseAutoencoderLinearCost</tt>, and modify it to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.<br />
<br />
=== Step 2: Learn features on small patches ===<br />
<br />
You will now use your sparse autoencoder to learn features on a set of 100 000 small 8x8 patches sampled from the larger 96x96 STL10 images (The [http://www.stanford.edu/~acoates//stl10/ STL10 dataset] comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck). <br />
<br />
The code provided in this step trains your sparse autoencoder for 400 iterations with the default parameters initialized in step 0. This should take around 45 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below. <br />
<br />
[[File:CNN_Features_Good.png|480px]]<br />
<br />
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:<br />
<br />
<table><br />
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr><br />
</table><br />
<br />
The learned features will be saved to <tt>STL10Features.mat</tt>, which will be used in the later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]].</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Learning_color_features_with_Sparse_AutoencodersExercise:Learning color features with Sparse Autoencoders2011-05-23T01:23:59Z<p>Jngiam: /* Step 2: Learn features on small patches */</p>
<hr />
<div>== Learning color features with Sparse Autoencoders ==<br />
<br />
In this exercise, you will implement a [[Linear Decoders | linear decoder]] (with the sparse autoencoder) to learn features on color images from the STL10 dataset. These features will be used in an later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]] for classifying STL10 images.<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/linear_decoder_exercise.zip linear_decoder_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from the [[Exercise:Sparse Autoencoder | sparse autoencoder exercise]].<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* [http://ufldl.stanford.edu/wiki/resources/stl10_patches_100k.zip STL10 patches]<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]<br />
<br />
''If you have not completed the exercise listed above, we strongly suggest you complete it first.''<br />
<br />
=== Learning from color image patches ===<br />
<br />
In all the exercises so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time. <br />
<br />
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image. <br />
<br />
=== Step 0: Initialization ===<br />
<br />
In this step, we initialize some parameters used in the exercise.<br />
<br />
=== Step 1: Modify your sparse autoencoder to use a linear decoder ===<br />
<br />
Copy <tt>sparseAutoencoder.m</tt> to the directory for this exercise and rename it to <tt>sparseAutoencoderLinear.m</tt>. Rename the function <tt>sparseAutoencoderCost</tt> in the file to <tt>sparseAutoencoderLinearCost</tt>, and modify it to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.<br />
<br />
=== Step 2: Learn features on small patches ===<br />
<br />
You will now use your sparse autoencoder to learn features on a set of 100 000 small 8x8 patches sampled from the larger 96x96 STL10 images (The [http://www.stanford.edu/~acoates//stl10/ STL10 dataset] comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck). <br />
<br />
The code provided in this step trains your sparse autoencoder for 400 iterations with the default parameters initialized in step 0. This should take around 45 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below. <br />
<br />
[[File:CNN_Features_Good.png|480px]]<br />
<br />
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:<br />
<br />
<table><br />
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr><br />
</table><br />
<br />
The learned features will be saved to <tt>STL10Features.mat</tt>, which will be used in the later [[Exercise:Convolution and Pooling | exercise on convolution and pooling]].</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Convolution_and_PoolingExercise:Convolution and Pooling2011-05-22T02:20:55Z<p>Jngiam: </p>
<hr />
<div>== Convolution and Pooling ==<br />
<br />
This problem set is divided into two parts. In the first part, you will implement a [[Linear Decoders | linear decoder]] to learn features on color images from the STL10 dataset. In the second part, you will use these learned features in convolution and pooling for classifying STL10 images.<br />
<br />
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.<br />
<br />
For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise. You will also need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* STL10 dataset<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderLinearCost.m</tt> (and related functions) from [[Exercise:Learning_color_features_with_Sparse_Autoencoders]]<br />
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
=== Step 1: Learn color features ===<br />
<br />
Learn a set of color features by working through [[Exercise:Learning_color_features_with_Sparse_Autoencoders]], we will be using these features in the next steps. You should learn 400 features and they should look like this:<br />
<br />
[[File:cnn_Features_Good.png|480px]]<br />
<br />
<br />
=== Step 2: Convolution and pooling ===<br />
<br />
Now that you have learned features for small patches, you will convolved these learned features with the large images, and pool these convolved features for use in a classifier later.<br />
<br />
==== Step 2a: Convolution ====<br />
<br />
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.<br />
<br />
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.<br />
<br />
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.<br />
<br />
However, there are two complications in using <tt>conv2</tt>. <br />
<br />
First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside. <br />
<br />
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. This is explained in greater detail in the implementation tip section following the code.<br />
<br />
Concretely, the code to do the convolution using <tt>conv2</tt> will look something like the following:<br />
<br />
<syntaxhighlight lang="matlab"><br />
convolvedFeatures = zeros(hiddenSize, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1);<br />
for imageNum = 1:numImages<br />
for featureNum = 1:hiddenSize<br />
% Obtain the feature matrix for this feature<br />
Wt = W(featureNum, :);<br />
Wt = reshape(Wt, patchDim, patchDim, 3);<br />
<br />
% Get convolution of image with feature matrix for each channel<br />
convolvedTemp = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1, 3);<br />
for channel = 1:3<br />
% Flip the feature matrix because of the definition of convolution, as explained later<br />
Wt(:, :, channel) = flipud(fliplr(squeeze(Wt(:, :, channel)))); <br />
convolvedTemp(:, :, channel) = conv2(squeeze(images(:, :, channel, imageNum)), squeeze(Wt(:, :, channel)), 'valid');<br />
end<br />
<br />
% The convolved feature is the sum of the convolved values for all channels<br />
convolvedFeatures(featureNum, imageNum, :, :) = sum(convolvedTemp, 3);<br />
end<br />
end<br />
</syntaxhighlight><br />
<br />
The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:<br />
<br />
<div style="border:1px solid black; padding: 5px"><br />
<br />
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt><br />
<br />
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <tt>image</tt> (a large image) and <tt>W</tt> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:<br />
<br />
<math><br />
W = <br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <tt>W</tt>, reversing its rows and columns, before convolving <tt>W</tt> with <tt>image</tt>, as below:<br />
<br />
<math><br />
\begin{pmatrix}<br />
1 & 2 & 3 \\<br />
4 & 5 & 6 \\<br />
7 & 8 & 9 \\<br />
\end{pmatrix}<br />
<br />
\xrightarrow{flip}<br />
<br />
\begin{pmatrix}<br />
9 & 8 & 7 \\<br />
6 & 5 & 4 \\<br />
3 & 2 & 1 \\<br />
\end{pmatrix}<br />
</math><br />
<br />
If the original layout of <tt>W</tt> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <tt>W</tt> before passing it into <tt>conv2</tt>, so that after MATLAB flips <tt>W</tt> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <tt>W</tt> using the following code snippet, which works for <tt>W</tt> of any dimension<br />
<br />
<syntaxhighlight lang="matlab"><br />
% Flip W for use in conv2 / convn<br />
temp = W(:);<br />
temp = flipud(temp);<br />
temp = reshape(temp, size(W));<br />
</syntaxhighlight><br />
<br />
</div><br />
<br />
To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.<br />
<br />
In particular, you did the following to the patches:<br />
<ol><br />
<li> divide by 255 to normalize them into the range <math>[0, 1]</math><br />
<li> subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches <br />
<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.<br />
</ol><br />
These same three steps must also be applied to the convolved patches. <br />
<br />
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.<br />
<br />
==== Step 2b: Checking ====<br />
<br />
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations for the selected features and patches directly using the sparse autoencoder. <br />
<br />
==== Step 2c: Pooling ====<br />
<br />
Implement [[pooling]] in the function <tt>cnnPool</tt> in <tt>cnnPool.m</tt>.<br />
<br />
=== Step 3: Use pooled features for classification ===<br />
<br />
Once you have implemented pooling, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses <tt>softmaxTrain</tt> from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.<br />
<br />
=== Step 4: Test classifier ===<br />
<br />
Now that you have a trained softmax classifier, you can see how well it performs on the test set. This section contains code that will load the test set (which is a smaller part of the STL10 dataset, specifically, 3200 rescaled 64x64 images from 4 different classes) and obtain the pooled, convolved features for the images using the functions <tt>cnnConvolve</tt> and <tt>cnnPool</tt> which you wrote earlier, as well as the preprocessing matrices <tt>ZCAWhite</tt> and <tt>meanImage</tt> which were computed earlier in preprocessing the training images. These pooled features will then be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/File:Cnn_Features_Good.pngFile:Cnn Features Good.png2011-05-22T02:20:37Z<p>Jngiam: uploaded a new version of &quot;File:Cnn Features Good.png&quot;: Reverted to version as of 03:56, 20 May 2011</p>
<hr />
<div></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Exercise:Learning_color_features_with_Sparse_AutoencodersExercise:Learning color features with Sparse Autoencoders2011-05-22T02:14:05Z<p>Jngiam: Created page with "== Learning color features with Sparse Autoencoders == In this exercise, you will implement a linear decoder (with the sparse autoencoder) to learn feature..."</p>
<hr />
<div>== Learning color features with Sparse Autoencoders ==<br />
<br />
In this exercise, you will implement a [[Linear Decoders | linear decoder]] (with the sparse autoencoder) to learn features on color images from the STL10 dataset. These features will be used later in convolution and pooling for classifying STL10 images.<br />
<br />
For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise. You will also need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.<br />
<br />
=== Dependencies ===<br />
<br />
The following additional files are required for this exercise:<br />
* STL10 dataset<br />
<br />
You will also need:<br />
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]<br />
<br />
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''<br />
<br />
<br />
=== Learning from color image patches ===<br />
<br />
In all the exercise so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time. <br />
<br />
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image. <br />
<br />
=== Step 0: Initialization ===<br />
<br />
In this step, we initialize some parameters used in the exercise.<br />
<br />
=== Step 1: Modify sparseAutoencoderCost.m to use a linear decoder ===<br />
<br />
Copy <tt>sparseAutoencoder.m</tt> to the directory for this exercise and rename it to <tt>sparseAutoencoderLinear.m</tt>. Rename the function <tt>sparseAutoencoderCost</tt> in the file to <tt>sparseAutoencoderLinearCost</tt>, and modify it to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.<br />
<br />
=== Step 2: Learn features on small patches ===<br />
<br />
You will now use your sparse autoencoder to learn features on a set of 100 000 small 8x8 patches sampled from the larger 96x96 STL10 images (The STL10 dataset comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck). <br />
<br />
Code has been provided to load patches sampled from the images. Note that you will need to apply the exact same preprocessing steps to the convolved images as you do to the patches used for training the autoencoder (you have to subtract the same mean image and use the exact same whitening matrix), so using a fixed set of patches means that you can recompute these matrices if necessary. Code to load the sampled patches has already been provided, so no additional changes are required on your part.<br />
<br />
In this step, you will train a sparse autoencoder (with linear decoder) on the sampled patches. The code provided trains your sparse autoencoder for 400 iterations with the default parameters initialized in step 0. This should take around 30 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below. <br />
<br />
[[File:cnn_Features_Good.png|480px]]<br />
<br />
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:<br />
<br />
<table><br />
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr><br />
</table></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/UFLDL_TutorialUFLDL Tutorial2011-05-22T02:11:03Z<p>Jngiam: </p>
<hr />
<div>'''Description:''' This tutorial will teach you the main ideas of Unsupervised Feature Learning and Deep Learning. By working through it, you will also get to implement several feature learning/deep learning algorithms, get to see them work for yourself, and learn how to apply/adapt these ideas to new problems.<br />
<br />
This tutorial assumes a basic knowledge of machine learning (specifically, familiarity with the ideas of supervised learning, logistic regression, gradient descent). If you are not familiar with these ideas, we suggest you go to this [http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning Machine Learning course] and complete<br />
sections II, III, IV (up to Logistic Regression) first. <br />
<br />
<br />
'''Sparse Autoencoder'''<br />
* [[Neural Networks]]<br />
* [[Backpropagation Algorithm]]<br />
* [[Gradient checking and advanced optimization]]<br />
* [[Autoencoders and Sparsity]]<br />
* [[Visualizing a Trained Autoencoder]]<br />
* [[Sparse Autoencoder Notation Summary]] <br />
* [[Exercise:Sparse Autoencoder]]<br />
<br />
<br />
'''Vectorized implementation'''<br />
* [[Vectorization]]<br />
* [[Logistic Regression Vectorization Example]]<br />
* [[Neural Network Vectorization]]<br />
* [[Exercise:Vectorization]]<br />
<br />
<br />
'''Preprocessing: PCA and Whitening'''<br />
* [[PCA]]<br />
* [[Whitening]]<br />
* [[Implementing PCA/Whitening]]<br />
* [[Exercise:PCA in 2D]]<br />
* [[Exercise:PCA and Whitening]]<br />
<br />
<br />
'''Softmax Regression'''<br />
* [[Softmax Regression]]<br />
* [[Exercise:Softmax Regression]]<br />
<br />
<br />
'''Self-Taught Learning and Unsupervised Feature Learning''' <br />
* [[Self-Taught Learning]]<br />
* [[Exercise:Self-Taught Learning]]<br />
<br />
<br />
'''Building Deep Networks for Classification'''<br />
* [[Self-Taught Learning to Deep Networks | From Self-Taught Learning to Deep Networks]]<br />
* [[Deep Networks: Overview]]<br />
* [[Stacked Autoencoders]]<br />
* [[Fine-tuning Stacked AEs]]<br />
* [[Exercise: Implement deep networks for digit classification]]<br />
<br />
<br />
----<br />
'''Note''': The sections above this line are stable. The sections below are still under construction, and may change without notice. Feel free to browse around however, and feedback/suggestions are welcome. <br />
<br />
'''Linear Decoders with Autoencoders'''<br />
* [[Linear Decoders]]<br />
* [[Exercise:Learning color features with Sparse Autoencoders]]<br />
<br />
'''Working with Large Images'''<br />
* [[Feature extraction using convolution]]<br />
* [[Pooling]]<br />
* [[Exercise:Convolution and Pooling]]<br />
<br />
<br />
----<br />
<br />
'''Miscellaneous''':<br />
<br />
[[MATLAB Modules]]<br />
<br />
[[Data Preprocessing]]<br />
<br />
[[Style Guide]]<br />
<br />
[[Useful Links]]<br />
<br />
<br />
'''Advanced Topics''':<br />
<br />
[[Convolutional training]] <br />
<br />
[[Restricted Boltzmann Machines]]<br />
<br />
[[Deep Belief Networks]]<br />
<br />
[[Denoising Autoencoders]]<br />
<br />
[[Sparse Coding]]<br />
<br />
[[K-means]]<br />
<br />
[[Spatial pyramids / Multiscale]]<br />
<br />
[[Slow Feature Analysis]]<br />
<br />
ICA Style Models:<br />
* [[Independent Component Analysis]]<br />
* [[Topographic Independent Component Analysis]]<br />
<br />
[[Tiled Convolution Networks]]<br />
<br />
----<br />
<br />
Material contributed by: Andrew Ng, Jiquan Ngiam, Chuan Yu Foo, Yifan Mai, Caroline Suen</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Linear_DecodersLinear Decoders2011-05-22T02:06:28Z<p>Jngiam: /* Linear Decoder */</p>
<hr />
<div>== Sparse Autoencoder Recap ==<br />
<br />
In the sparse autoencoder implementation, we had 3 layers of neurons: the input layer, a hidden layer and an output layer. Recall that each neuron (in the output layer) computes the following:<br />
<br />
<math><br />
\begin{align}<br />
z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\<br />
a^{(3)} &= f(z^{(3)})<br />
\end{align}<br />
</math><br />
<br />
where <math>a^{(3)}</math> is the reconstruction of the input (layer <math>a^{(1)}</math>).<br />
<br />
Notice that due to the choice of the sigmoid function for <math>f(z^{(3)})</math> we need to constrain the inputs to be in the range <tt>[0,1]</tt>. <br />
<br />
While some datasets like MNIST fit well into this criteria, this hard constraint can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is no longer constrained to <tt>[0,1]</tt> and it is not clear what kind of scaling is appropriate to fit the data into the constrained range.<br />
<br />
== Linear Decoder ==<br />
<br />
One easy fix for the fore-mentioned problem is to use a ''linear-decoder'', that is, we set <math>a^{(3)} = z^{(3)}</math>.<br />
<br />
For a linear decoder, the activation function of the output unit is effectively the identity function. Formally, to reconstruct the input from the features using a linear decoder, we simply set <math>\hat{x} = a^{(3)} = z^{(3)} = W^{(2)}a + b^{(2)}</math> instead, without applying the sigmoid function. Now the reconstructed output <math>\hat{x}</math> is a linear function of the activations of the hidden units, which means that by varying <math>W</math>, each output unit <math>\hat{x}</math> can be made to produce any activation without the previous constraints. This allows us to train the sparse autoencoder on any input that takes on real values without any additional pre-processing. (Note that the hidden units are '''still sigmoid units''', that is, <math>a = \sigma(W^{(1)}*x + b^{(1)})</math>, where <math>x</math> is the input, and <math>W^{(1)}</math> and <math>b^{(1)}</math> are the weight and bias terms for the hidden units)<br />
<br />
Of course, now that we have changed the activation function of the output units, the gradients of the output units must be changed accordingly. Recall that for each output unit, we set the error as follows:<br />
:<math><br />
\begin{align}<br />
\delta_i<br />
= \frac{\partial}{\partial z_i} \;\;<br />
\frac{1}{2} \left\|y - \hat{x}\right\|^2 = - (y_i - \hat{x}_i) \cdot f'(z_i)<br />
\end{align}<br />
</math><br />
(where <math>y = x</math> is the desired output, <math>\hat{x}</math> is the reconstructed output of our autoencoder, <math>z</math> is the input to the the output units, and <math>f(x)</math> is our activation function)<br />
<br />
Since the activation function for the output units for a linear decoder is just the identity function, the above reduces to:<br />
:<math><br />
\begin{align}<br />
\delta_i = - (y_i - \hat{x}_i) \cdot z_i<br />
\end{align}<br />
</math></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Linear_DecodersLinear Decoders2011-05-22T02:05:52Z<p>Jngiam: /* Linear Decoder */</p>
<hr />
<div>== Sparse Autoencoder Recap ==<br />
<br />
In the sparse autoencoder implementation, we had 3 layers of neurons: the input layer, a hidden layer and an output layer. Recall that each neuron (in the output layer) computes the following:<br />
<br />
<math><br />
\begin{align}<br />
z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\<br />
a^{(3)} &= f(z^{(3)})<br />
\end{align}<br />
</math><br />
<br />
where <math>a^{(3)}</math> is the reconstruction of the input (layer <math>a^{(1)}</math>).<br />
<br />
Notice that due to the choice of the sigmoid function for <math>f(z^{(3)})</math> we need to constrain the inputs to be in the range <tt>[0,1]</tt>. <br />
<br />
While some datasets like MNIST fit well into this criteria, this hard constraint can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is no longer constrained to <tt>[0,1]</tt> and it is not clear what kind of scaling is appropriate to fit the data into the constrained range.<br />
<br />
== Linear Decoder ==<br />
<br />
One easy fix for the fore-mentioned problem is to use a ''linear-decoder'', that is, we set <math>a^{(3)} = f(z^{(3)})</math>.<br />
<br />
For a linear decoder, the activation function of the output unit is effectively the identity function. Formally, to reconstruct the input from the features using a linear decoder, we simply set <math>\hat{x} = a^{(3)} = z^{(3)} = W^{(2)}a + b^{(2)}</math> instead, without applying the sigmoid function. Now the reconstructed output <math>\hat{x}</math> is a linear function of the activations of the hidden units, which means that by varying <math>W</math>, each output unit <math>\hat{x}</math> can be made to produce any activation without the previous constraints. This allows us to train the sparse autoencoder on any input that takes on real values without any additional pre-processing. (Note that the hidden units are '''still sigmoid units''', that is, <math>a = \sigma(W^{(1)}*x + b^{(1)})</math>, where <math>x</math> is the input, and <math>W^{(1)}</math> and <math>b^{(1)}</math> are the weight and bias terms for the hidden units)<br />
<br />
Of course, now that we have changed the activation function of the output units, the gradients of the output units must be changed accordingly. Recall that for each output unit, we set the error as follows:<br />
:<math><br />
\begin{align}<br />
\delta_i<br />
= \frac{\partial}{\partial z_i} \;\;<br />
\frac{1}{2} \left\|y - \hat{x}\right\|^2 = - (y_i - \hat{x}_i) \cdot f'(z_i)<br />
\end{align}<br />
</math><br />
(where <math>y = x</math> is the desired output, <math>\hat{x}</math> is the reconstructed output of our autoencoder, <math>z</math> is the input to the the output units, and <math>f(x)</math> is our activation function)<br />
<br />
Since the activation function for the output units for a linear decoder is just the identity function, the above reduces to:<br />
:<math><br />
\begin{align}<br />
\delta_i = - (y_i - \hat{x}_i) \cdot z_i<br />
\end{align}<br />
</math></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Linear_DecodersLinear Decoders2011-05-22T02:05:28Z<p>Jngiam: /* Sparse Autoencoder Recap */</p>
<hr />
<div>== Sparse Autoencoder Recap ==<br />
<br />
In the sparse autoencoder implementation, we had 3 layers of neurons: the input layer, a hidden layer and an output layer. Recall that each neuron (in the output layer) computes the following:<br />
<br />
<math><br />
\begin{align}<br />
z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\<br />
a^{(3)} &= f(z^{(3)})<br />
\end{align}<br />
</math><br />
<br />
where <math>a^{(3)}</math> is the reconstruction of the input (layer <math>a^{(1)}</math>).<br />
<br />
Notice that due to the choice of the sigmoid function for <math>f(z^{(3)})</math> we need to constrain the inputs to be in the range <tt>[0,1]</tt>. <br />
<br />
While some datasets like MNIST fit well into this criteria, this hard constraint can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is no longer constrained to <tt>[0,1]</tt> and it is not clear what kind of scaling is appropriate to fit the data into the constrained range.<br />
<br />
== Linear Decoder ==<br />
<br />
One easy fix for the fore-mentioned problem is to use a ''linear-decoder'', that is, we set <math>a^{(3)} &= f(z^{(3)})</math>.<br />
<br />
For a linear decoder, the activation function of the output unit is effectively the identity function. Formally, to reconstruct the input from the features using a linear decoder, we simply set <math>\hat{x} = a^{(3)} = z^{(3)} = W^{(2)}a + b^{(2)}</math> instead, without applying the sigmoid function. Now the reconstructed output <math>\hat{x}</math> is a linear function of the activations of the hidden units, which means that by varying <math>W</math>, each output unit <math>\hat{x}</math> can be made to produce any activation without the previous constraints. This allows us to train the sparse autoencoder on any input that takes on real values without any additional pre-processing. (Note that the hidden units are '''still sigmoid units''', that is, <math>a = \sigma(W^{(1)}*x + b^{(1)})</math>, where <math>x</math> is the input, and <math>W^{(1)}</math> and <math>b^{(1)}</math> are the weight and bias terms for the hidden units)<br />
<br />
Of course, now that we have changed the activation function of the output units, the gradients of the output units must be changed accordingly. Recall that for each output unit, we set the error as follows:<br />
:<math><br />
\begin{align}<br />
\delta_i<br />
= \frac{\partial}{\partial z_i} \;\;<br />
\frac{1}{2} \left\|y - \hat{x}\right\|^2 = - (y_i - \hat{x}_i) \cdot f'(z_i)<br />
\end{align}<br />
</math><br />
(where <math>y = x</math> is the desired output, <math>\hat{x}</math> is the reconstructed output of our autoencoder, <math>z</math> is the input to the the output units, and <math>f(x)</math> is our activation function)<br />
<br />
Since the activation function for the output units for a linear decoder is just the identity function, the above reduces to:<br />
:<math><br />
\begin{align}<br />
\delta_i = - (y_i - \hat{x}_i) \cdot z_i<br />
\end{align}<br />
</math></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/Linear_DecodersLinear Decoders2011-05-22T02:03:55Z<p>Jngiam: </p>
<hr />
<div>== Sparse Autoencoder Recap ==<br />
<br />
In the sparse autoencoder implementation, we had 3 layers of neurons: the input layer, a hidden layer and an output layer. Recall that each neuron (in the output layer) computes the following:<br />
<br />
<math><br />
z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\<br />
a^{(3)} &= f(z^{(3)})<br />
</math><br />
<br />
where <math>a^{(3)}</math> is the reconstruction of the input (layer <math>a^{(1)}</math>).<br />
<br />
Notice that due to the choice of the sigmoid function for <math>f(z^{(3)})</math> we need to constrain the inputs to be in the range <tt>[0,1]</tt>. <br />
<br />
While some datasets like MNIST fit well into this criteria, this hard constraint can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is no longer constrained to <tt>[0,1]</tt> and it is not clear what kind of scaling is appropriate to fit the data into the constrained range.<br />
<br />
== Linear Decoder ==<br />
<br />
One easy fix for the fore-mentioned problem is to use a ''linear-decoder'', that is, we set <math>a^{(3)} &= f(z^{(3)})</math>.<br />
<br />
For a linear decoder, the activation function of the output unit is effectively the identity function. Formally, to reconstruct the input from the features using a linear decoder, we simply set <math>\hat{x} = a^{(3)} = z^{(3)} = W^{(2)}a + b^{(2)}</math> instead, without applying the sigmoid function. Now the reconstructed output <math>\hat{x}</math> is a linear function of the activations of the hidden units, which means that by varying <math>W</math>, each output unit <math>\hat{x}</math> can be made to produce any activation without the previous constraints. This allows us to train the sparse autoencoder on any input that takes on real values without any additional pre-processing. (Note that the hidden units are '''still sigmoid units''', that is, <math>a = \sigma(W^{(1)}*x + b^{(1)})</math>, where <math>x</math> is the input, and <math>W^{(1)}</math> and <math>b^{(1)}</math> are the weight and bias terms for the hidden units)<br />
<br />
Of course, now that we have changed the activation function of the output units, the gradients of the output units must be changed accordingly. Recall that for each output unit, we set the error as follows:<br />
:<math><br />
\begin{align}<br />
\delta_i<br />
= \frac{\partial}{\partial z_i} \;\;<br />
\frac{1}{2} \left\|y - \hat{x}\right\|^2 = - (y_i - \hat{x}_i) \cdot f'(z_i)<br />
\end{align}<br />
</math><br />
(where <math>y = x</math> is the desired output, <math>\hat{x}</math> is the reconstructed output of our autoencoder, <math>z</math> is the input to the the output units, and <math>f(x)</math> is our activation function)<br />
<br />
Since the activation function for the output units for a linear decoder is just the identity function, the above reduces to:<br />
:<math><br />
\begin{align}<br />
\delta_i = - (y_i - \hat{x}_i) \cdot z_i<br />
\end{align}<br />
</math></div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/PoolingPooling2011-05-22T01:07:48Z<p>Jngiam: /* Pooling: Overview */</p>
<hr />
<div>== Pooling: Overview ==<br />
<br />
After obtaining features using convolution, the next step is to use them in for classification. In theory, one could use all the extracted features with a classifier (e.g., softmax regression) but this can be computationally challenging. Consider for instance images of size 96x96 pixels and 400 features that are 8x8 each and convolved over the entire image; each features after (valid) convolution results in <math>(96-8+1)*(96-8+1)=7921</math> and since we have 400 features, this results in a feature vector of <math>89^2 * 400 = 3,168,400</math> features per example. Learning a classifier with inputs having 3+ million features can be unwieldy and also prone to over-fitting.<br />
<br />
However, thinking about why we decided to obtain convolved features suggests a further step that could improve our feature extraction pipeline. Recall that we decided to obtain convolved features because images have the property that features that are useful in one region will be useful for other regions (stationary). <br />
<br />
Then, to describe a large image, one natural approach is to aggregate statistics of these features at various locations: ''pooling'' over regions of the image. For example, one could compute the mean (or max) value of a particular feature over a region of the image. These summary statistics are much lower in dimension (compared to using all extracted features) and can also improve results (less over-fitting). <br />
<br />
The following image shows how pooling is done over 4 non-overlapping regions of the image.<br />
<br />
[[File:Pooling_schematic.gif]]<br />
<br />
== Pooling for Invariance ==<br />
<br />
If one chooses the pooling regions to be contiguous areas in the image and only pools features generated from the same (replicated) hidden units. Then, these pooling units will then be '''translation invariant'''. This means that the same (pooled) feature will be active even when the image undergoes (small) translations. Translation-invariant features are often desirable; in many tasks (e.g., object detection, audio recognition), the label of the example (image) is the same even when the image is translated. For example, if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position.<br />
<br />
== Notes ==<br />
<br />
Formally, after obtaining our convolved features as earlier, we decide the size of the region, say <math>m \times n</math> to pool our convolved features over. Then, we divide our convolved features into disjoint <math>m \times n</math> regions, and take the maximum (or mean) feature activation over these regions to obtain the pooled convolved features. These pooled features can then be used for classification.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/PoolingPooling2011-05-22T01:07:31Z<p>Jngiam: /* Pooling: Overview */</p>
<hr />
<div>== Pooling: Overview ==<br />
<br />
After obtaining features using convolution, the next step is to use them in for classification. In theory, one could use all the extracted features with a classifier (e.g., softmax regression) but this can be computationally challenging. Consider for instance images of size 96x96 pixels and 400 features that are 8x8 each and convolved over the entire image; each features after (valid) convolution results in <math>(96-8+1)*(96-8+1)=7921</math> and since we have 400 features, this results in a feature vector of <math>(89^2) * 400 = 3,168,400</math> features per example. Learning a classifier with inputs having 3+ million features can be unwieldy and also prone to over-fitting.<br />
<br />
However, thinking about why we decided to obtain convolved features suggests a further step that could improve our feature extraction pipeline. Recall that we decided to obtain convolved features because images have the property that features that are useful in one region will be useful for other regions (stationary). <br />
<br />
Then, to describe a large image, one natural approach is to aggregate statistics of these features at various locations: ''pooling'' over regions of the image. For example, one could compute the mean (or max) value of a particular feature over a region of the image. These summary statistics are much lower in dimension (compared to using all extracted features) and can also improve results (less over-fitting). <br />
<br />
The following image shows how pooling is done over 4 non-overlapping regions of the image.<br />
<br />
[[File:Pooling_schematic.gif]]<br />
<br />
== Pooling for Invariance ==<br />
<br />
If one chooses the pooling regions to be contiguous areas in the image and only pools features generated from the same (replicated) hidden units. Then, these pooling units will then be '''translation invariant'''. This means that the same (pooled) feature will be active even when the image undergoes (small) translations. Translation-invariant features are often desirable; in many tasks (e.g., object detection, audio recognition), the label of the example (image) is the same even when the image is translated. For example, if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position.<br />
<br />
== Notes ==<br />
<br />
Formally, after obtaining our convolved features as earlier, we decide the size of the region, say <math>m \times n</math> to pool our convolved features over. Then, we divide our convolved features into disjoint <math>m \times n</math> regions, and take the maximum (or mean) feature activation over these regions to obtain the pooled convolved features. These pooled features can then be used for classification.</div>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php/PoolingPooling2011-05-21T23:42:06Z<p>Jngiam: </p>
<hr />
<div>== Pooling: Overview ==<br />
<br />
After obtaining features using convolution, the next step is to use them in for classification. In theory, one could use all the extracted features with a classifier (e.g., softmax regression) but this can be computationally challenging. Consider for instance images of size 96x96 pixels and 400 features that are 8x8 each and convolved over the entire image; each features after (valid) convolution results in <math>(96-8+1)*(96-8+1)=7921</math> and since we have 400 features, this results in a feature vector of <math>(98^2) * 400 = 3,168,400</math> features per example. Learning a classifier with inputs having 3+ million features can be unwieldy and also prone to over-fitting.<br />
<br />
However, thinking about why we decided to obtain convolved features suggests a further step that could improve our feature extraction pipeline. Recall that we decided to obtain convolved features because images have the property that features that are useful in one region will be useful for other regions (stationary). <br />
<br />
Then, to describe a large image, one natural approach is to aggregate statistics of these features at various locations: ''pooling'' over regions of the image. For example, one could compute the mean (or max) value of a particular feature over a region of the image. These summary statistics are much lower in dimension (compared to using all extracted features) and can also improve results (less over-fitting). <br />
<br />
The following image shows how pooling is done over 4 non-overlapping regions of the image.<br />
<br />
[[File:Pooling_schematic.gif]]<br />
<br />
<br />
== Pooling for Invariance ==<br />
<br />
If one chooses the pooling regions to be contiguous areas in the image and only pools features generated from the same (replicated) hidden units. Then, these pooling units will then be '''translation invariant'''. This means that the same (pooled) feature will be active even when the image undergoes (small) translations. Translation-invariant features are often desirable; in many tasks (e.g., object detection, audio recognition), the label of the example (image) is the same even when the image is translated. For example, if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position.<br />
<br />
== Notes ==<br />
<br />
Formally, after obtaining our convolved features as earlier, we decide the size of the region, say <math>m \times n</math> to pool our convolved features over. Then, we divide our convolved features into disjoint <math>m \times n</math> regions, and take the maximum (or mean) feature activation over these regions to obtain the pooled convolved features. These pooled features can then be used for classification.</div>Jngiam