PCA

From Ufldl

Jump to: navigation, search
(References)
Line 240: Line 240:
that you retained 120 (or whatever other number of) components.
that you retained 120 (or whatever other number of) components.
 +
For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
 +
to have a similar range of values to the others (and to have a mean close to
 +
zero).  If you've used PCA on other applications before, you may therefore have
 +
separately pre-processed each feature to have zero mean and unit variance, by
 +
separately estimating the mean and variance of each feature <math>\textstyle x_j</math>.  However,
 +
this isn't the pre-processing that we will apply to most types of images.  Specifically,
 +
suppose we are training our algorithm on '''natural images''', so that <math>\textstyle x_j</math> is
 +
the value of pixel <math>\textstyle j</math>.  By "natural images," we informally mean the type of image that
 +
a typical animal or person might see over their lifetime.\footnote{Usually we use
 +
images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image
 +
patches randomly from these to train the algorithm.  But in practice most
 +
feature learning algorithms are extremely robust to the exact type of image
 +
it is trained on, so most images taken with a normal camera, so long as they
 +
aren't excessively blurry or have strange artifacts, should work.}
 +
In this case, it makes little sense to estimate a separate mean and
 +
variance for each pixel, because the statistics in one part
 +
of the image should (theoretically) be the same as any other. 
 +
This property of images is called '''stationarity'''.
 +
 +
In detail, in order for PCA to work well, informally we require that (i) The
 +
features have approximately zero mean, and (ii) The different features have
 +
similar variances to each other.  With natural images, (ii) is already
 +
satisfied even without variance normalization, and so we won't perform any
 +
variance normalization. 
 +
(If you are training on audio data---say, on
 +
spectrograms---or on text data---say, bag-of-word vectors---we will usually not perform
 +
variance normalization either.) 
 +
In fact, PCA is invariant to the scaling of
 +
the data, and will return the same eigenvectors regardless of the scaling of
 +
the input.  More formally, if you multiply each feature vector <math>\textstyle x</math> by some
 +
positive number (thus scaling every feature in every training example by the
 +
same number), PCA's output eigenvectors will not change. 
 +
 +
So, we won't use variance normalization.  The only normalization we need to
 +
perform then is mean normalization, to ensure that the features have a mean
 +
around zero.  Depending on the application, very often we are not interested
 +
in how bright the overall input image is.  For example, in object recognition
 +
tasks, the overall brightness of the image doesn't affect what objects
 +
there are in the image.  More formally, we are not interested in the
 +
mean intensity value of an image patch; thus, we can subtract out this value,
 +
as a form of mean normalization. 
 +
 +
Concretely, if <math>\textstyle x^{(i)} \in \Re^{n}</math> are the (grayscale) intensity values of
 +
a 16x16 image patch (<math>\textstyle n=256</math>), we might normalize the intensity of each image
 +
<math>\textstyle x^{(i)}</math> as follows:
 +
\begin{align}
 +
\mu^{(i)} &:= \frac{1}{n} \sum_{j=1}^n x^{(i)}_j  \\
 +
x^{(i)}_j &:= x^{(i)}_j - \mu^{(i)}  \;\;\;\;\hbox{for all <math>\textstyle j</math>} 
 +
\end{align}
 +
Note that the two steps above are done separately for each image <math>\textstyle x^{(i)}</math>,
 +
and that <math>\textstyle \mu^{(i)}</math> here is the mean intensity of the image <math>\textstyle x^{(i)}</math>.  In particular,
 +
this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.
 +
 +
If you are training your algorithm on images other than natural images (for
 +
example, images of handwritten characters, or images of single isolated objects
 +
centered against a white background), other types of normalization might be
 +
worth considering, and the best choice may be application dependent.  But
 +
when training on natural images, using the per-image mean normalization
 +
as in Equations~(\ref{eqn-normalize1}-\ref{eqn-normalize2})
 +
would be a reasonable default.
 +
 +
For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
 +
to have a similar range of values to the others (and to have a mean close to
 +
zero).  If you've used PCA on other applications before, you may therefore have
 +
separately pre-processed each feature to have zero mean and unit variance, by
 +
separately estimating the mean and variance of each feature <math>\textstyle x_j</math>.  However,
 +
this isn't the pre-processing that we will apply to most types of images.  Specifically,
 +
suppose we are training our algorithm on '''natural images''', so that <math>\textstyle x_j</math> is
 +
the value of pixel <math>\textstyle j</math>.  By "natural images," we informally mean the type of image that
 +
a typical animal or person might see over their lifetime.\footnote{Usually we use
 +
images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image
 +
patches randomly from these to train the algorithm.  But in practice most
 +
feature learning algorithms are extremely robust to the exact type of image
 +
it is trained on, so most images taken with a normal camera, so long as they
 +
aren't excessively blurry or have strange artifacts, should work.}
 +
In this case, it makes little sense to estimate a separate mean and
 +
variance for each pixel, because the statistics in one part
 +
of the image should (theoretically) be the same as any other. 
 +
This property of images is called '''stationarity'''.
 +
 +
In detail, in order for PCA to work well, informally we require that (i) The
 +
features have approximately zero mean, and (ii) The different features have
 +
similar variances to each other.  With natural images, (ii) is already
 +
satisfied even without variance normalization, and so we won't perform any
 +
variance normalization. 
 +
(If you are training on audio data---say, on
 +
spectrograms---or on text data---say, bag-of-word vectors---we will usually not perform
 +
variance normalization either.) 
 +
In fact, PCA is invariant to the scaling of
 +
the data, and will return the same eigenvectors regardless of the scaling of
 +
the input.  More formally, if you multiply each feature vector <math>\textstyle x</math> by some
 +
positive number (thus scaling every feature in every training example by the
 +
same number), PCA's output eigenvectors will not change. 
 +
 +
So, we won't use variance normalization.  The only normalization we need to
 +
perform then is mean normalization, to ensure that the features have a mean
 +
around zero.  Depending on the application, very often we are not interested
 +
in how bright the overall input image is.  For example, in object recognition
 +
tasks, the overall brightness of the image doesn't affect what objects
 +
there are in the image.  More formally, we are not interested in the
 +
mean intensity value of an image patch; thus, we can subtract out this value,
 +
as a form of mean normalization. 
 +
 +
Concretely, if <math>\textstyle x^{(i)} \in \Re^{n}</math> are the (grayscale) intensity values of
 +
a 16x16 image patch (<math>\textstyle n=256</math>), we might normalize the intensity of each image
 +
<math>\textstyle x^{(i)}</math> as follows:
 +
\begin{align}
 +
\mu^{(i)} &:= \frac{1}{n} \sum_{j=1}^n x^{(i)}_j  \\
 +
x^{(i)}_j &:= x^{(i)}_j - \mu^{(i)}  \;\;\;\;\hbox{for all <math>\textstyle j</math>} 
 +
\end{align}
 +
Note that the two steps above are done separately for each image <math>\textstyle x^{(i)}</math>,
 +
and that <math>\textstyle \mu^{(i)}</math> here is the mean intensity of the image <math>\textstyle x^{(i)}</math>.  In particular,
 +
this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.
 +
 +
If you are training your algorithm on images other than natural images (for
 +
example, images of handwritten characters, or images of single isolated objects
 +
centered against a white background), other types of normalization might be
 +
worth considering, and the best choice may be application dependent.  But
 +
when training on natural images, using the per-image mean normalization
 +
as in Equations~(\ref{eqn-normalize1}-\ref{eqn-normalize2})
 +
would be a reasonable default.
 +
 +
== PCA on Images ==
For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
to have a similar range of values to the others (and to have a mean close to
to have a similar range of values to the others (and to have a mean close to

Revision as of 11:04, 4 April 2011

Personal tools