Data Preprocessing
From Ufldl
Line 8: | Line 8: | ||
- | == | + | == Data Normalization == |
+ | A standard first step to data preprocessing is data normalization. While there are a few possible approaches, this step is usually clear depending on the data. The common methods for feature normalization are: | ||
+ | |||
+ | * Simple Rescaling | ||
+ | * Per-example mean subtraction (a.k.a. remove DC) | ||
+ | * Feature Standardization (zero-mean and unit variance for each feature across the dataset) | ||
+ | |||
+ | === Simple Rescaling === | ||
+ | |||
+ | In simple rescaling, our goal is to rescale the data along each data dimension (possibly independently) so that the final data vectors lie in the range <math>[0, 1]</math> or <math>[-1, 1]</math> (depending on your dataset). This is useful for later processing as many ''default'' parameters (e.g., epsilon in PCA-whitening) treat the data as if it has been scaled to a reasonable range. | ||
+ | |||
+ | '''Example: ''' When processing natural images, we often obtain pixel values in the range <math>[0, 255]</math>. It is a common operation to rescale these values to <math>[0, 1]</math> by dividing the data by 255. | ||
+ | |||
+ | |||
+ | === Per-example mean subtraction === | ||
+ | |||
+ | If the data has the property that the | ||
== PCA/ZCA Whitening == | == PCA/ZCA Whitening == |