# 池化

Jump to: navigation, search
 Revision as of 11:40, 7 March 2013 (view source)Kandeng (Talk | contribs)← Older edit Revision as of 11:41, 7 March 2013 (view source)Kandeng (Talk | contribs) Newer edit → Line 2: Line 2: == Pooling: Overview == == Pooling: Overview == - + ----------------------------------------------------------------------------- After obtaining features using convolution, we would next like to use them for classification. In theory, one could use all the extracted features with a classifier such as a softmax classifier, but this can be computationally challenging. Consider for instance images of size 96x96 pixels, and suppose we have learned 400 features over 8x8 inputs. Each convolution results in an output of size (96 − 8 + 1) * (96 − 8 + 1) = 7921, and since we have 400 features, this results in a vector of 892 * 400 = 3,168,400 features per example. Learning a classifier with inputs having 3+ million features can be unwieldy, and can also be prone to over-fitting. After obtaining features using convolution, we would next like to use them for classification. In theory, one could use all the extracted features with a classifier such as a softmax classifier, but this can be computationally challenging. Consider for instance images of size 96x96 pixels, and suppose we have learned 400 features over 8x8 inputs. Each convolution results in an output of size (96 − 8 + 1) * (96 − 8 + 1) = 7921, and since we have 400 features, this results in a vector of 892 * 400 = 3,168,400 features per example. Learning a classifier with inputs having 3+ million features can be unwieldy, and can also be prone to over-fitting. 【初译】： 【初译】： Pooling: 概述 Pooling: 概述 - + ----------------------------------------------------------------------------- 在通过卷积获得了特征（features）之后，下一步我们希望利用这些特征去做分类。理论上讲，人们可以把所有解析出来的特征关联到一个分类方法，例如softmax分类方法，但计算起来仍然是极富挑战性的。例如：对于一个96X96像素的图像，假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集，由于已经得到了400个features，所以对于每个样例（example）结果集的大小就将达到892 * 400 = 3,168,400 个特征。学习一个拥有超过3百万特征的输入的分类方法将会是相当不便的，并且极易出现过度匹配（over-fitting）. 在通过卷积获得了特征（features）之后，下一步我们希望利用这些特征去做分类。理论上讲，人们可以把所有解析出来的特征关联到一个分类方法，例如softmax分类方法，但计算起来仍然是极富挑战性的。例如：对于一个96X96像素的图像，假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集，由于已经得到了400个features，所以对于每个样例（example）结果集的大小就将达到892 * 400 = 3,168,400 个特征。学习一个拥有超过3百万特征的输入的分类方法将会是相当不便的，并且极易出现过度匹配（over-fitting）. 【一审】： 【一审】： 池化: 概述 池化: 概述 - + ----------------------------------------------------------------------------- 在通过卷积获得了特征（features）之后，下一步我们希望利用这些特征去做分类。理论上讲，人们可以把所有解析出来的特征关联到一个分类器，例如softmax分类器，但计算量非常大。例如：对于一个96X96像素的图像，假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集，由于已经得到了400个特征，所以对于每个样例（example）结果集的大小就将达到892 * 400 = 3,168,400 个特征。这样学习一个拥有超过3百万特征的输入的分类器是相当不明智的，并且极易出现过度拟合（over-fitting）. 在通过卷积获得了特征（features）之后，下一步我们希望利用这些特征去做分类。理论上讲，人们可以把所有解析出来的特征关联到一个分类器，例如softmax分类器，但计算量非常大。例如：对于一个96X96像素的图像，假设我们已经通过8X8个输入学习得到了400个特征。而每一个卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921的结果集，由于已经得到了400个特征，所以对于每个样例（example）结果集的大小就将达到892 * 400 = 3,168,400 个特征。这样学习一个拥有超过3百万特征的输入的分类器是相当不明智的，并且极易出现过度拟合（over-fitting）. 【二审】： 【二审】： 池化: 概述 池化: 概述 - + ----------------------------------------------------------------------------- 在通过卷积获得了特征（features）之后，下一步我们希望利用这些特征去做分类。理论上讲，人们可以用所有提取得到的特征去训练分类器，例如softmax分类器，但这样做面临计算量的挑战。例如：对于一个96X96像素的图像，假设我们已经学习得到了400个定义在8X8输入上的特征，每一个特征和图像卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921维的卷积特征，由于有400个特征，所以每个样例（example）都会得到一个892 * 400 = 3,168,400维的卷积特征向量。学习一个拥有超过3百万特征输入的分类器十分不便，并且容易出现过拟合（over-fitting）。 在通过卷积获得了特征（features）之后，下一步我们希望利用这些特征去做分类。理论上讲，人们可以用所有提取得到的特征去训练分类器，例如softmax分类器，但这样做面临计算量的挑战。例如：对于一个96X96像素的图像，假设我们已经学习得到了400个定义在8X8输入上的特征，每一个特征和图像卷积都会得到一个(96 − 8 + 1) * (96 − 8 + 1) = 7921维的卷积特征，由于有400个特征，所以每个样例（example）都会得到一个892 * 400 = 3,168,400维的卷积特征向量。学习一个拥有超过3百万特征输入的分类器十分不便，并且容易出现过拟合（over-fitting）。 Line 50: Line 50: == Pooling for Invariance == == Pooling for Invariance == - + ----------------------------------------------------------------------------- If one chooses the pooling regions to be contiguous areas in the image and only pools features generated from the same (replicated) hidden units. Then, these pooling units will then be translation invariant. This means that the same (pooled) feature will be active even when the image undergoes (small) translations. Translation-invariant features are often desirable; in many tasks (e.g., object detection, audio recognition), the label of the example (image) is the same even when the image is translated. For example, if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position. If one chooses the pooling regions to be contiguous areas in the image and only pools features generated from the same (replicated) hidden units. Then, these pooling units will then be translation invariant. This means that the same (pooled) feature will be active even when the image undergoes (small) translations. Translation-invariant features are often desirable; in many tasks (e.g., object detection, audio recognition), the label of the example (image) is the same even when the image is translated. For example, if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position. Line 72: Line 72: == Formal description == == Formal description == - + ----------------------------------------------------------------------------- Formally, after obtaining our convolved features as described earlier, we decide the size of the region, say  to pool our convolved features over. Then, we divide our convolved features into disjoint  regions, and take the mean (or maximum) feature activation over these regions to obtain the pooled convolved features. These pooled features can then be used for classification. Formally, after obtaining our convolved features as described earlier, we decide the size of the region, say  to pool our convolved features over. Then, we divide our convolved features into disjoint  regions, and take the mean (or maximum) feature activation over these regions to obtain the pooled convolved features. These pooled features can then be used for classification. ----------------------------------------------------------------------------- -----------------------------------------------------------------------------