Feature extraction using convolution

From Ufldl

Jump to: navigation, search
(Locally Connected Networks)
 
Line 13: Line 13:
This idea of having locally connected networks also draws inspiration from how the early visual system is wired up in biology.  Specifically, neurons in the visual cortex have localized receptive fields (i.e., they respond only to stimuli in a certain location).
This idea of having locally connected networks also draws inspiration from how the early visual system is wired up in biology.  Specifically, neurons in the visual cortex have localized receptive fields (i.e., they respond only to stimuli in a certain location).
-
== Weight Sharing (Convolution) ==
+
== Convolutions ==
-
Natural images have the property of being '''stationary''', meaning that the statistics of one part of the image are the same as any other part.  This suggests that the features that we learn at one part of the image can also be applied to other regions--i.e., we can use the same features at all locations.  
+
Natural images have the property of being '''stationary''', meaning that the statistics of one part of the image are the same as any other part.  This suggests that the features that we learn at one part of the image can also be applied to other parts of the image, and we can use the same features at all locations.  
<!--
<!--
To capture this idea of learning the same features "everywhere in the image," one option is to add an additional  added as an additional constraint known as weight sharing (tying) between the hidden units at different locations. If one chooses to have the same hidden unit replicated at every possible location, this turns out to be equivalent to a convolution of the feature (as a filter) on the image.
To capture this idea of learning the same features "everywhere in the image," one option is to add an additional  added as an additional constraint known as weight sharing (tying) between the hidden units at different locations. If one chooses to have the same hidden unit replicated at every possible location, this turns out to be equivalent to a convolution of the feature (as a filter) on the image.
Line 23: Line 23:
While in principle one can learn feature convolutionally over the entire image, the learning procedure becomes more complicated to implement and often takes longer to execute.  
While in principle one can learn feature convolutionally over the entire image, the learning procedure becomes more complicated to implement and often takes longer to execute.  
!-->
!-->
 +
More precisely, having learned features over small (say 8x8) patches sampled randomly from the larger image, we can then apply this learned 8x8 feature detector anywhere in the image.  Specifically, we can take the learned 8x8 features and  
More precisely, having learned features over small (say 8x8) patches sampled randomly from the larger image, we can then apply this learned 8x8 feature detector anywhere in the image.  Specifically, we can take the learned 8x8 features and  
'''convolve''' them with the larger image, thus obtaining a different feature activation value at each location in the image.   
'''convolve''' them with the larger image, thus obtaining a different feature activation value at each location in the image.   
-
To give a concrete example, suppose you have learned features on 8x8 patches sampled from a 96x96 image. To get the convolved features, for every 8x8 region of the 96x96 image, that is, the 8x8 regions starting at <math>(1, 1), (1, 2), \ldots (89, 89)</math>, you would extract the 8x8 patch, and run it through your trained sparse autoencoder to get the feature activations. This would result in a set of 100 89x89 convolved features.   
+
 
 +
To give a concrete example, suppose you have learned features on 8x8 patches sampled from a 96x96 image. Suppose further this was done with an autoencoder that has 100 hidden units.  To get the convolved features, for every 8x8 region of the 96x96 image, that is, the 8x8 regions starting at <math>(1, 1), (1, 2), \ldots (89, 89)</math>, you would extract the 8x8 patch, and run it through your trained sparse autoencoder to get the feature activations. This would result in 100 sets 89x89 convolved features.   
 +
 
<!--
<!--
Line 36: Line 39:
Formally, given some large <math>r \times c</math> images <math>x_{large}</math>, we first train a sparse autoencoder on small <math>a \times b</math> patches <math>x_{small}</math> sampled from these images, learning <math>k</math> features <math>f = \sigma(W^{(1)}x_{small} + b^{(1)})</math> (where <math>\sigma</math> is the sigmoid function), given by the weights <math>W^{(1)}</math> and biases <math>b^{(1)}</math> from the visible units to the hidden units. For every <math>a \times b</math> patch <math>x_s</math> in the large image, we compute <math>f_s = \sigma(W^{(1)}x_s + b^{(1)})</math>, giving us <math>f_{convolved}</math>, a <math>k \times (r - a + 1) \times (c - b + 1)</math> array of convolved features.  
Formally, given some large <math>r \times c</math> images <math>x_{large}</math>, we first train a sparse autoencoder on small <math>a \times b</math> patches <math>x_{small}</math> sampled from these images, learning <math>k</math> features <math>f = \sigma(W^{(1)}x_{small} + b^{(1)})</math> (where <math>\sigma</math> is the sigmoid function), given by the weights <math>W^{(1)}</math> and biases <math>b^{(1)}</math> from the visible units to the hidden units. For every <math>a \times b</math> patch <math>x_s</math> in the large image, we compute <math>f_s = \sigma(W^{(1)}x_s + b^{(1)})</math>, giving us <math>f_{convolved}</math>, a <math>k \times (r - a + 1) \times (c - b + 1)</math> array of convolved features.  
 +
 +
In the next section, we further describe how to "pool" these features together to get even better features for classification.
In the next section, we further describe how to "pool" these features together to get even better features for classification.
 +
 +
 +
{{Languages|卷积特征提取|中文}}

Latest revision as of 04:11, 8 April 2013

Personal tools