池化

From Ufldl

Jump to: navigation, search
Line 2: Line 2:
== Pooling: Overview ==
== Pooling: Overview ==
-
  -----------------------------------------------------------------------------
+
   
After obtaining features using convolution, we would next like to use them for classification. In theory, one could use all the extracted features with a classifier such as a softmax classifier, but this can be computationally challenging. Consider for instance images of size 96x96 pixels, and suppose we have learned 400 features over 8x8 inputs. Each convolution results in an output of size (96 − 8 + 1) * (96 − 8 + 1) = 7921, and since we have 400 features, this results in a vector of 892 * 400 = 3,168,400 features per example. Learning a classifier with inputs having 3+ million features can be unwieldy, and can also be prone to over-fitting.  
After obtaining features using convolution, we would next like to use them for classification. In theory, one could use all the extracted features with a classifier such as a softmax classifier, but this can be computationally challenging. Consider for instance images of size 96x96 pixels, and suppose we have learned 400 features over 8x8 inputs. Each convolution results in an output of size (96 − 8 + 1) * (96 − 8 + 1) = 7921, and since we have 400 features, this results in a vector of 892 * 400 = 3,168,400 features per example. Learning a classifier with inputs having 3+ million features can be unwieldy, and can also be prone to over-fitting.  
【初译】:
【初译】:
Pooling: 概述
Pooling: 概述
 +
在通过卷积获得了特征(features)之后,下一步我们希望利用这些特征去做分类。理论上讲,人们可以把所有解析出来的特征关联到一个分类方法,例如softmax分类方法,但计算起来仍然是极富挑战性的。例如:对于一个96X96像素的图像,假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集,由于已经得到了400个features,所以对于每个样例(example)结果集的大小就将达到892 * 400 = 3,168,400 个特征。学习一个拥有超过3百万特征的输入的分类方法将会是相当不便的,并且极易出现过度匹配(over-fitting).
在通过卷积获得了特征(features)之后,下一步我们希望利用这些特征去做分类。理论上讲,人们可以把所有解析出来的特征关联到一个分类方法,例如softmax分类方法,但计算起来仍然是极富挑战性的。例如:对于一个96X96像素的图像,假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集,由于已经得到了400个features,所以对于每个样例(example)结果集的大小就将达到892 * 400 = 3,168,400 个特征。学习一个拥有超过3百万特征的输入的分类方法将会是相当不便的,并且极易出现过度匹配(over-fitting).
【一审】:
【一审】:
池化: 概述
池化: 概述
-
-----------------------------------------------------------------------------
+
 
在通过卷积获得了特征(features)之后,下一步我们希望利用这些特征去做分类。理论上讲,人们可以把所有解析出来的特征关联到一个分类器,例如softmax分类器,但计算量非常大。例如:对于一个96X96像素的图像,假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集,由于已经得到了400个特征,所以对于每个样例(example)结果集的大小就将达到892 * 400 = 3,168,400 个特征。这样学习一个拥有超过3百万特征的输入的分类器是相当不明智的,并且极易出现过度拟合(over-fitting).
在通过卷积获得了特征(features)之后,下一步我们希望利用这些特征去做分类。理论上讲,人们可以把所有解析出来的特征关联到一个分类器,例如softmax分类器,但计算量非常大。例如:对于一个96X96像素的图像,假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集,由于已经得到了400个特征,所以对于每个样例(example)结果集的大小就将达到892 * 400 = 3,168,400 个特征。这样学习一个拥有超过3百万特征的输入的分类器是相当不明智的,并且极易出现过度拟合(over-fitting).
【二审】:
【二审】:
池化: 概述
池化: 概述
-
-----------------------------------------------------------------------------
+
 
在通过卷积获得了特征(features)之后,下一步我们希望利用这些特征去做分类。理论上讲,人们可以用所有提取得到的特征去训练分类器,例如softmax分类器,但这样做面临计算量的挑战。例如:对于一个96X96像素的图像,假设我们已经学习得到了400个定义在8X8输入上的特征,每一个特征和图像卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921维的卷积特征,由于有400个特征,所以每个样例(example)都会得到一个892 * 400 = 3,168,400维的卷积特征向量。学习一个拥有超过3百万特征输入的分类器十分不便,并且容易出现过拟合(over-fitting)。
在通过卷积获得了特征(features)之后,下一步我们希望利用这些特征去做分类。理论上讲,人们可以用所有提取得到的特征去训练分类器,例如softmax分类器,但这样做面临计算量的挑战。例如:对于一个96X96像素的图像,假设我们已经学习得到了400个定义在8X8输入上的特征,每一个特征和图像卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921维的卷积特征,由于有400个特征,所以每个样例(example)都会得到一个892 * 400 = 3,168,400维的卷积特征向量。学习一个拥有超过3百万特征输入的分类器十分不便,并且容易出现过拟合(over-fitting)。
Line 44: Line 45:
下图显示池化如何应用于一个图像的四块不重合区域。
下图显示池化如何应用于一个图像的四块不重合区域。
-
[http://deeplearning.stanford.edu/wiki/images/0/08/Pooling_schematic.gif]
+
[http://deeplearning.stanford.edu/wiki/images/0/08/Pooling_schematic.gif 点击查看原图]
== Pooling for Invariance ==
== Pooling for Invariance ==
-
-----------------------------------------------------------------------------
+
 
If one chooses the pooling regions to be contiguous areas in the image and only pools features generated from the same (replicated) hidden units. Then, these pooling units will then be translation invariant. This means that the same (pooled) feature will be active even when the image undergoes (small) translations. Translation-invariant features are often desirable; in many tasks (e.g., object detection, audio recognition), the label of the example (image) is the same even when the image is translated. For example, if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position.  
If one chooses the pooling regions to be contiguous areas in the image and only pools features generated from the same (replicated) hidden units. Then, these pooling units will then be translation invariant. This means that the same (pooled) feature will be active even when the image undergoes (small) translations. Translation-invariant features are often desirable; in many tasks (e.g., object detection, audio recognition), the label of the example (image) is the same even when the image is translated. For example, if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position.  

Revision as of 11:40, 7 March 2013

Personal tools