数据预处理

From Ufldl

Jump to: navigation, search
Line 1: Line 1:
-
数据预处理
+
初译:@gausschen
 +
一审:@咖灰茶
 +
 
 +
【原文】
== Overview ==
== Overview ==
Data preprocessing plays a very important in many deep learning algorithms. In practice, many methods work best after the data has been normalized and whitened. However, the exact parameters for data preprocessing are usually not immediately apparent unless one has much experience working with the algorithms. In this page, we hope to demystify some of the preprocessing methods and also provide tips (and a "standard pipeline") for preprocessing data.
Data preprocessing plays a very important in many deep learning algorithms. In practice, many methods work best after the data has been normalized and whitened. However, the exact parameters for data preprocessing are usually not immediately apparent unless one has much experience working with the algorithms. In this page, we hope to demystify some of the preprocessing methods and also provide tips (and a "standard pipeline") for preprocessing data.
 +
【初译】
 +
== 概要 ==
 +
 +
数据预处理在众多深度学习算法中都起着重要作用,实际情况中,将数据做归一化和白化处理后,很多算法能够发挥最佳效果。然而除非对这些算法有丰富的使用经验,否则预处理的精确参数并非显而易见。在本页中,我们希望能够揭开预处理方法的神秘面纱,同时为预处理数据提供技巧(和标准流程)
 +
 +
【一审】
 +
== 概要 ==
 +
 +
数据预处理在众多深度学习算法中都起着重要作用,实际情况中,将数据做归一化和白化处理后,很多算法能够发挥最佳效果。然而除非对这些算法有丰富的使用经验,否则预处理的精确参数并非显而易见。在本页中,我们希望能够揭开预处理方法的神秘面纱,同时为预处理数据提供技巧(和标准流程)
 +
 +
【原文】
{{quote |
{{quote |
Tip: When approaching a dataset, the first thing to do is to look at the data itself and observe its properties. While the techniques here apply generally, you might want to opt to do certain things differently given your dataset. For example, one standard preprocessing trick is to subtract the mean of each data point from itself (also known as remove DC, local mean subtraction, subtractive normalization). While this makes sense for data such as natural images, it is less obvious for data where stationarity does not hold.  
Tip: When approaching a dataset, the first thing to do is to look at the data itself and observe its properties. While the techniques here apply generally, you might want to opt to do certain things differently given your dataset. For example, one standard preprocessing trick is to subtract the mean of each data point from itself (also known as remove DC, local mean subtraction, subtractive normalization). While this makes sense for data such as natural images, it is less obvious for data where stationarity does not hold.  
}}
}}
 +
【初译】
 +
提示:获得数据后首先要做的事是查看数据并获知其特性,而后针对数据选择采取相应的处理。例如一个标准的预处理方法是减去所有数据点的均值(也被称为移除直流,局部均值消减,消减归一化),这一方法对一些数据是有效的,如自然图像,但对非平稳的数据并非如此。
 +
【一审】
 +
提示:获得数据后首先要做的事是观察数据并获知其特性。本部分将介绍一些通用的技术,在实际中应该针对具体数据选择合适的预处理技术。例如一种标准的预处理方法是对每一个数据点都减去它的均值(也被称为移除直流分量,局部均值消减,消减归一化),这一方法对诸如自然图像这类数据是有效的,但对非平稳的数据则不然。
 +
 +
【原文】
== Data Normalization ==
== Data Normalization ==
Line 17: Line 37:
* Feature Standardization (zero-mean and unit variance for each feature across the dataset)
* Feature Standardization (zero-mean and unit variance for each feature across the dataset)
 +
【初译】
 +
== 数据归一化 ==
 +
 +
数据预处理标准的第一步是数据归一化,由于已有一些适用的方法,根据数据的情况这一步通常是清晰地。特征归一化常用的方法包含如下几种:
 +
 +
* 简单重缩放
 +
* 上例中的均值消减(也被称为移除直流)
 +
* 特征标准化(使数据集中所有特征都具有零均值和单位方差)
 +
 +
【一审】
 +
== 数据归一化 ==
 +
 +
数据预处理的标准的第一步是数据归一化。已有一些常用的方法,根据数据的具体情况可以明确地确定这一步可以采用的方法。特征归一化常用的方法包含如下几种:
 +
 +
* 特征缩放
 +
* 分量均值归一化(也称为移除直流分量)
 +
* 特征标准化(使数据集中所有特征都具有零均值和单位方差)
=== Simple Rescaling ===
=== Simple Rescaling ===

Revision as of 07:55, 8 March 2013

Personal tools