PCA

Revision as of 11:02, 4 April 2011 (view source)

10.31.228.181 (Talk)

(→References)

← Older edit

Revision as of 11:04, 4 April 2011 (view source)

10.31.228.181 (Talk)

Newer edit →

Line 240:

that you retained 120 (or whatever other number of) components.

+

For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>

+

to have a similar range of values to the others (and to have a mean close to

+

zero). If you've used PCA on other applications before, you may therefore have

+

separately pre-processed each feature to have zero mean and unit variance, by

+

separately estimating the mean and variance of each feature <math>\textstyle x_j</math>. However,

+

this isn't the pre-processing that we will apply to most types of images. Specifically,

+

suppose we are training our algorithm on '''natural images''', so that <math>\textstyle x_j</math> is

+

the value of pixel <math>\textstyle j</math>. By "natural images," we informally mean the type of image that

+

a typical animal or person might see over their lifetime.\footnote{Usually we use

+

images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image

+

patches randomly from these to train the algorithm. But in practice most

+

feature learning algorithms are extremely robust to the exact type of image

+

it is trained on, so most images taken with a normal camera, so long as they

+

aren't excessively blurry or have strange artifacts, should work.}

+

In this case, it makes little sense to estimate a separate mean and

+

variance for each pixel, because the statistics in one part

+

of the image should (theoretically) be the same as any other.

+

This property of images is called '''stationarity'''.

+

In detail, in order for PCA to work well, informally we require that (i) The

+

features have approximately zero mean, and (ii) The different features have

+

similar variances to each other. With natural images, (ii) is already

+

satisfied even without variance normalization, and so we won't perform any

+

variance normalization.

+

(If you are training on audio data---say, on

+

spectrograms---or on text data---say, bag-of-word vectors---we will usually not perform

+

variance normalization either.)

+

In fact, PCA is invariant to the scaling of

+

the data, and will return the same eigenvectors regardless of the scaling of

+

the input. More formally, if you multiply each feature vector <math>\textstyle x</math> by some

+

positive number (thus scaling every feature in every training example by the

+

same number), PCA's output eigenvectors will not change.

+

So, we won't use variance normalization. The only normalization we need to

+

perform then is mean normalization, to ensure that the features have a mean

+

around zero. Depending on the application, very often we are not interested

+

in how bright the overall input image is. For example, in object recognition

+

tasks, the overall brightness of the image doesn't affect what objects

+

there are in the image. More formally, we are not interested in the

+

mean intensity value of an image patch; thus, we can subtract out this value,

+

as a form of mean normalization.

+

Concretely, if <math>\textstyle x^{(i)} \in \Re^{n}</math> are the (grayscale) intensity values of

+

a 16x16 image patch (<math>\textstyle n=256</math>), we might normalize the intensity of each image

+

<math>\textstyle x^{(i)}</math> as follows:

+

\begin{align}

+

\mu^{(i)} &:= \frac{1}{n} \sum_{j=1}^n x^{(i)}_j \\

+

x^{(i)}_j &:= x^{(i)}_j - \mu^{(i)} \;\;\;\;\hbox{for all <math>\textstyle j</math>}

+

\end{align}

+

Note that the two steps above are done separately for each image <math>\textstyle x^{(i)}</math>,

+

and that <math>\textstyle \mu^{(i)}</math> here is the mean intensity of the image <math>\textstyle x^{(i)}</math>. In particular,

+

this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.

+

If you are training your algorithm on images other than natural images (for

+

example, images of handwritten characters, or images of single isolated objects

+

centered against a white background), other types of normalization might be

+

worth considering, and the best choice may be application dependent. But

+

when training on natural images, using the per-image mean normalization

+

as in Equations~(\ref{eqn-normalize1}-\ref{eqn-normalize2})

+

would be a reasonable default.

+

For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>

+

to have a similar range of values to the others (and to have a mean close to

+

zero). If you've used PCA on other applications before, you may therefore have

+

separately pre-processed each feature to have zero mean and unit variance, by

+

separately estimating the mean and variance of each feature <math>\textstyle x_j</math>. However,

+

this isn't the pre-processing that we will apply to most types of images. Specifically,

+

suppose we are training our algorithm on '''natural images''', so that <math>\textstyle x_j</math> is

+

the value of pixel <math>\textstyle j</math>. By "natural images," we informally mean the type of image that

+

a typical animal or person might see over their lifetime.\footnote{Usually we use

+

images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image

+

patches randomly from these to train the algorithm. But in practice most

+

feature learning algorithms are extremely robust to the exact type of image

+

it is trained on, so most images taken with a normal camera, so long as they

+

aren't excessively blurry or have strange artifacts, should work.}

+

In this case, it makes little sense to estimate a separate mean and

+

variance for each pixel, because the statistics in one part

+

of the image should (theoretically) be the same as any other.

+

This property of images is called '''stationarity'''.

+

In detail, in order for PCA to work well, informally we require that (i) The

+

features have approximately zero mean, and (ii) The different features have

+

similar variances to each other. With natural images, (ii) is already

+

satisfied even without variance normalization, and so we won't perform any

+

variance normalization.

+

(If you are training on audio data---say, on

+

spectrograms---or on text data---say, bag-of-word vectors---we will usually not perform

+

variance normalization either.)

+

In fact, PCA is invariant to the scaling of

+

the data, and will return the same eigenvectors regardless of the scaling of

+

the input. More formally, if you multiply each feature vector <math>\textstyle x</math> by some

+

positive number (thus scaling every feature in every training example by the

+

same number), PCA's output eigenvectors will not change.

+

So, we won't use variance normalization. The only normalization we need to

+

perform then is mean normalization, to ensure that the features have a mean

+

around zero. Depending on the application, very often we are not interested

+

in how bright the overall input image is. For example, in object recognition

+

tasks, the overall brightness of the image doesn't affect what objects

+

there are in the image. More formally, we are not interested in the

+

mean intensity value of an image patch; thus, we can subtract out this value,

+

as a form of mean normalization.

+

Concretely, if <math>\textstyle x^{(i)} \in \Re^{n}</math> are the (grayscale) intensity values of

+

a 16x16 image patch (<math>\textstyle n=256</math>), we might normalize the intensity of each image

+

<math>\textstyle x^{(i)}</math> as follows:

+

\begin{align}

+

\mu^{(i)} &:= \frac{1}{n} \sum_{j=1}^n x^{(i)}_j \\

+

x^{(i)}_j &:= x^{(i)}_j - \mu^{(i)} \;\;\;\;\hbox{for all <math>\textstyle j</math>}

+

\end{align}

+

Note that the two steps above are done separately for each image <math>\textstyle x^{(i)}</math>,

+

and that <math>\textstyle \mu^{(i)}</math> here is the mean intensity of the image <math>\textstyle x^{(i)}</math>. In particular,

+

this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.

+

If you are training your algorithm on images other than natural images (for

+

example, images of handwritten characters, or images of single isolated objects

+

centered against a white background), other types of normalization might be

+

worth considering, and the best choice may be application dependent. But

+

when training on natural images, using the per-image mean normalization

+

as in Equations~(\ref{eqn-normalize1}-\ref{eqn-normalize2})

+

would be a reasonable default.

+

== PCA on Images ==

For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>

to have a similar range of values to the others (and to have a mean close to

From Ufldl

Revision as of 11:04, 4 April 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 240: / Line 240: @@
 that you retained 120 (or whatever other number of) components.
+For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
+to have a similar range of values to the others (and to have a mean close to
+zero).  If you've used PCA on other applications before, you may therefore have
+separately pre-processed each feature to have zero mean and unit variance, by
+separately estimating the mean and variance of each feature <math>\textstyle x_j</math>.  However,
+this isn't the pre-processing that we will apply to most types of images.  Specifically,
+suppose we are training our algorithm on '''natural images''', so that <math>\textstyle x_j</math> is
+the value of pixel <math>\textstyle j</math>.  By "natural images," we informally mean the type of image that
+a typical animal or person might see over their lifetime.\footnote{Usually we use
+images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image
+patches randomly from these to train the algorithm.  But in practice most
+feature learning algorithms are extremely robust to the exact type of image
+it is trained on, so most images taken with a normal camera, so long as they
+aren't excessively blurry or have strange artifacts, should work.}
+In this case, it makes little sense to estimate a separate mean and
+variance for each pixel, because the statistics in one part
+of the image should (theoretically) be the same as any other.
+This property of images is called '''stationarity'''.
+In detail, in order for PCA to work well, informally we require that (i) The
+features have approximately zero mean, and (ii) The different features have
+similar variances to each other.  With natural images, (ii) is already
+satisfied even without variance normalization, and so we won't perform any
+variance normalization.
+(If you are training on audio data---say, on
+spectrograms---or on text data---say, bag-of-word vectors---we will usually not perform
+variance normalization either.)
+In fact, PCA is invariant to the scaling of
+the data, and will return the same eigenvectors regardless of the scaling of
+the input.  More formally, if you multiply each feature vector <math>\textstyle x</math> by some
+positive number (thus scaling every feature in every training example by the
+same number), PCA's output eigenvectors will not change.
+So, we won't use variance normalization.  The only normalization we need to
+perform then is mean normalization, to ensure that the features have a mean
+around zero.  Depending on the application, very often we are not interested
+in how bright the overall input image is.  For example, in object recognition
+tasks, the overall brightness of the image doesn't affect what objects
+there are in the image.  More formally, we are not interested in the
+mean intensity value of an image patch; thus, we can subtract out this value,
+as a form of mean normalization.
+Concretely, if <math>\textstyle x^{(i)} \in \Re^{n}</math> are the (grayscale) intensity values of
+a 16x16 image patch (<math>\textstyle n=256</math>), we might normalize the intensity of each image
+<math>\textstyle x^{(i)}</math> as follows:
+\begin{align}
+\mu^{(i)} &:= \frac{1}{n} \sum_{j=1}^n x^{(i)}_j  \\
+x^{(i)}_j &:= x^{(i)}_j - \mu^{(i)}  \;\;\;\;\hbox{for all <math>\textstyle j</math>}
+\end{align}
+Note that the two steps above are done separately for each image <math>\textstyle x^{(i)}</math>,
+and that <math>\textstyle \mu^{(i)}</math> here is the mean intensity of the image <math>\textstyle x^{(i)}</math>.  In particular,
+this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.
+If you are training your algorithm on images other than natural images (for
+example, images of handwritten characters, or images of single isolated objects
+centered against a white background), other types of normalization might be
+worth considering, and the best choice may be application dependent.  But
+when training on natural images, using the per-image mean normalization
+as in Equations~(\ref{eqn-normalize1}-\ref{eqn-normalize2})
+would be a reasonable default.
+For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
+to have a similar range of values to the others (and to have a mean close to
+zero).  If you've used PCA on other applications before, you may therefore have
+separately pre-processed each feature to have zero mean and unit variance, by
+separately estimating the mean and variance of each feature <math>\textstyle x_j</math>.  However,
+this isn't the pre-processing that we will apply to most types of images.  Specifically,
+suppose we are training our algorithm on '''natural images''', so that <math>\textstyle x_j</math> is
+the value of pixel <math>\textstyle j</math>.  By "natural images," we informally mean the type of image that
+a typical animal or person might see over their lifetime.\footnote{Usually we use
+images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image
+patches randomly from these to train the algorithm.  But in practice most
+feature learning algorithms are extremely robust to the exact type of image
+it is trained on, so most images taken with a normal camera, so long as they
+aren't excessively blurry or have strange artifacts, should work.}
+In this case, it makes little sense to estimate a separate mean and
+variance for each pixel, because the statistics in one part
+of the image should (theoretically) be the same as any other.
+This property of images is called '''stationarity'''.
+In detail, in order for PCA to work well, informally we require that (i) The
+features have approximately zero mean, and (ii) The different features have
+similar variances to each other.  With natural images, (ii) is already
+satisfied even without variance normalization, and so we won't perform any
+variance normalization.
+(If you are training on audio data---say, on
+spectrograms---or on text data---say, bag-of-word vectors---we will usually not perform
+variance normalization either.)
+In fact, PCA is invariant to the scaling of
+the data, and will return the same eigenvectors regardless of the scaling of
+the input.  More formally, if you multiply each feature vector <math>\textstyle x</math> by some
+positive number (thus scaling every feature in every training example by the
+same number), PCA's output eigenvectors will not change.
+So, we won't use variance normalization.  The only normalization we need to
+perform then is mean normalization, to ensure that the features have a mean
+around zero.  Depending on the application, very often we are not interested
+in how bright the overall input image is.  For example, in object recognition
+tasks, the overall brightness of the image doesn't affect what objects
+there are in the image.  More formally, we are not interested in the
+mean intensity value of an image patch; thus, we can subtract out this value,
+as a form of mean normalization.
+Concretely, if <math>\textstyle x^{(i)} \in \Re^{n}</math> are the (grayscale) intensity values of
+a 16x16 image patch (<math>\textstyle n=256</math>), we might normalize the intensity of each image
+<math>\textstyle x^{(i)}</math> as follows:
+\begin{align}
+\mu^{(i)} &:= \frac{1}{n} \sum_{j=1}^n x^{(i)}_j  \\
+x^{(i)}_j &:= x^{(i)}_j - \mu^{(i)}  \;\;\;\;\hbox{for all <math>\textstyle j</math>}
+\end{align}
+Note that the two steps above are done separately for each image <math>\textstyle x^{(i)}</math>,
+and that <math>\textstyle \mu^{(i)}</math> here is the mean intensity of the image <math>\textstyle x^{(i)}</math>.  In particular,
+this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.
+If you are training your algorithm on images other than natural images (for
+example, images of handwritten characters, or images of single isolated objects
+centered against a white background), other types of normalization might be
+worth considering, and the best choice may be application dependent.  But
+when training on natural images, using the per-image mean normalization
+as in Equations~(\ref{eqn-normalize1}-\ref{eqn-normalize2})
+would be a reasonable default.
+== PCA on Images ==
 For PCA to work, usually we want each of the features <math>\textstyle x_1, x_2, \ldots, x_n</math>
 to have a similar range of values to the others (and to have a mean close to