数据预处理

From Ufldl

Jump to: navigation, search
Line 2: Line 2:
一审:@咖灰茶
一审:@咖灰茶
-
【原文】
 
-
== Overview/概要 ==
 
 +
== Overview/概要 ==
 +
【原文】
Data preprocessing plays a very important in many deep learning algorithms. In practice, many methods work best after the data has been normalized and whitened. However, the exact parameters for data preprocessing are usually not immediately apparent unless one has much experience working with the algorithms. In this page, we hope to demystify some of the preprocessing methods and also provide tips (and a "standard pipeline") for preprocessing data.
Data preprocessing plays a very important in many deep learning algorithms. In practice, many methods work best after the data has been normalized and whitened. However, the exact parameters for data preprocessing are usually not immediately apparent unless one has much experience working with the algorithms. In this page, we hope to demystify some of the preprocessing methods and also provide tips (and a "standard pipeline") for preprocessing data.
Line 24: Line 24:
提示:获得数据后首先要做的事是查看数据并获知其特性,而后针对数据选择采取相应的处理。例如一个标准的预处理方法是减去所有数据点的均值(也被称为移除直流,局部均值消减,消减归一化),这一方法对一些数据是有效的,如自然图像,但对非平稳的数据并非如此。
提示:获得数据后首先要做的事是查看数据并获知其特性,而后针对数据选择采取相应的处理。例如一个标准的预处理方法是减去所有数据点的均值(也被称为移除直流,局部均值消减,消减归一化),这一方法对一些数据是有效的,如自然图像,但对非平稳的数据并非如此。
}}
}}
-
 
【一审】
【一审】
Line 31: Line 30:
}}
}}
-
【原文】
 
-
== Data Normalization/数据归一化 ==
 
 +
== Data Normalization/数据归一化 ==
 +
【原文】
A standard first step to data preprocessing is data normalization. While there are a few possible approaches, this step is usually clear depending on the data. The common methods for feature normalization are:
A standard first step to data preprocessing is data normalization. While there are a few possible approaches, this step is usually clear depending on the data. The common methods for feature normalization are:
Line 54: Line 53:
* 特征标准化(使数据集中所有特征都具有零均值和单位方差)
* 特征标准化(使数据集中所有特征都具有零均值和单位方差)
-
=== Simple Rescaling ===
+
=== Simple Rescaling/特征缩放 ===
-
 
+
【原文】
In simple rescaling, our goal is to rescale the data along each data dimension (possibly independently) so that the final data vectors lie in the range <math>[0, 1]</math> or  <math>[-1, 1]</math>  (depending on your dataset). This is useful for later processing as many ''default'' parameters (e.g., epsilon in PCA-whitening) treat the data as if it has been scaled to a reasonable range.  
In simple rescaling, our goal is to rescale the data along each data dimension (possibly independently) so that the final data vectors lie in the range <math>[0, 1]</math> or  <math>[-1, 1]</math>  (depending on your dataset). This is useful for later processing as many ''default'' parameters (e.g., epsilon in PCA-whitening) treat the data as if it has been scaled to a reasonable range.  
'''Example: ''' When processing natural images, we often obtain pixel values in the range <math>[0, 255]</math>. It is a common operation to rescale these values to  <math>[0, 1]</math> by dividing the data by 255.
'''Example: ''' When processing natural images, we often obtain pixel values in the range <math>[0, 255]</math>. It is a common operation to rescale these values to  <math>[0, 1]</math> by dividing the data by 255.
 +
【初译】
 +
简单重缩放的目的在于通过在每一维度上(可能相互独立)对数据进行的重缩放,使得最终的数据向量落在<math>[0, 1]</math>或<math>[-1, 1]</math>的区间内(根据数据情况)。这对后续的处理十分重要,因为很多默认参数(如主成分分析-白化中的epsilon) 都基于数据已被缩放到合理区间的假定。
=== Per-example mean subtraction ===
=== Per-example mean subtraction ===

Revision as of 08:02, 8 March 2013

Personal tools