PCA

Revision as of 23:14, 29 April 2011 (view source)

Ang (Talk | contribs)

(→Rotating the Data)

← Older edit

Latest revision as of 13:18, 7 April 2013 (view source)

Kandeng (Talk | contribs)

Line 102:

because <math>\textstyle U x_{\rm rot} = UU^T x = x</math>.

-

== Reducing the Data ==

+

== Reducing the Data Dimension ==

We see that the principal direction of variation of the data is the first

Line 165:

Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation

of the original <math>\textstyle x \in \Re^n</math>. Given <math>\textstyle \tilde{x}</math>, how can we recover an approximation <math>\textstyle \hat{x}</math> to

-

the original value of <math>\textstyle x</math>? From ~~the~~ [[#Rotating the Data|~~previous~~ section]], we know that <math>\textstyle x = U x_{\rm rot}</math>. Further,

+

the original value of <math>\textstyle x</math>? From an [[#Rotating the Data|earlier section]], we know that <math>\textstyle x = U x_{\rm rot}</math>. Further,

we can think of <math>\textstyle \tilde{x}</math> as an approximation to <math>\textstyle x_{\rm rot}</math>, where we have

set the last <math>\textstyle n-k</math> components to zeros. Thus, given <math>\textstyle \tilde{x} \in \Re^k</math>, we can

Line 202:

approximation to the data.

-

To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance

+

To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance retained'''

-

retained''' for different values of <math>\textstyle k</math>. Concretely, if <math>\textstyle k=n</math>, then we have

+

for different values of <math>\textstyle k</math>. Concretely, if <math>\textstyle k=n</math>, then we have

an exact approximation to the data, and we say that 100% of the variance is

retained. I.e., all of the variation of the original data is retained.

Line 254:

Note: Usually we use images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image patches randomly from these to train the algorithm. But in practice most feature learning algorithms are extremely robust to the exact type of image it is trained on, so most images taken with a normal camera, so long as they aren't excessively blurry or have strange artifacts, should work.

-

~~In this case~~, it makes little sense to estimate a separate mean and

+

When training on natural images, it makes little sense to estimate a separate mean and

variance for each pixel, because the statistics in one part

of the image should (theoretically) be the same as any other.

-

This property of images is called '''stationarity'''.

+

This property of images is called '''stationarity.'''

In detail, in order for PCA to work well, informally we require that (i) The

Line 294:

this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.

-

If you are training your algorithm on images other than natural images (for example, images of handwritten characters, or images of single isolated objects centered against a white background), other types of normalization might be worth considering, and the best choice may be application dependent. But when training on natural images, using the per-image mean normalization as the ~~normalization~~ equations above would be a reasonable default.

+

If you are training your algorithm on images other than natural images (for example, images of handwritten characters, or images of single isolated objects centered against a white background), other types of normalization might be worth considering, and the best choice may be application dependent. But when training on natural images, using the per-image mean normalization method as given in the equations above would be a reasonable default.

== References ==

http://cs229.stanford.edu

+

From Ufldl

Latest revision as of 13:18, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 102: / Line 102: @@
 because <math>\textstyle U x_{\rm rot} =  UU^T x = x</math>.
-== Reducing the Data ==
+== Reducing the Data Dimension ==
 We see that the principal direction of variation of the data is the first
@@ Line 165: / Line 165: @@
 Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation
 of the original <math>\textstyle x \in \Re^n</math>.  Given <math>\textstyle \tilde{x}</math>, how can we recover an approximation <math>\textstyle \hat{x}</math> to
-the original value of <math>\textstyle x</math>?  From the [[#Rotating the Data|previous section]], we know that <math>\textstyle x = U x_{\rm rot}</math>.  Further,
+the original value of <math>\textstyle x</math>?  From an [[#Rotating the Data|earlier section]], we know that <math>\textstyle x = U x_{\rm rot}</math>.  Further,
 we can think of <math>\textstyle \tilde{x}</math> as an approximation to <math>\textstyle x_{\rm rot}</math>, where we have
 set the last <math>\textstyle n-k</math> components to zeros.  Thus, given <math>\textstyle \tilde{x} \in \Re^k</math>, we can
@@ Line 202: / Line 202: @@
 approximation to the data.
-To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance
+To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance retained'''
-retained''' for different values of <math>\textstyle k</math>.  Concretely, if <math>\textstyle k=n</math>, then we have
+for different values of <math>\textstyle k</math>.  Concretely, if <math>\textstyle k=n</math>, then we have
 an exact approximation to the data, and we say that 100% of the variance is
 retained.  I.e., all of the variation of the original data is retained.
@@ Line 254: / Line 254: @@
 Note: Usually we use images of outdoor scenes with grass, trees, etc., and cut out small (say 16x16) image patches randomly from these to train the algorithm.  But in practice most feature learning algorithms are extremely robust to the exact type of image  it is trained on, so most images taken with a normal camera, so long as they aren't excessively blurry or have strange artifacts, should work.
-In this case, it makes little sense to estimate a separate mean and
+When training on natural images, it makes little sense to estimate a separate mean and
 variance for each pixel, because the statistics in one part
 of the image should (theoretically) be the same as any other.
-This property of images is called '''stationarity'''.
+This property of images is called '''stationarity.'''
 In detail, in order for PCA to work well, informally we require that (i) The
@@ Line 294: / Line 294: @@
 this is not the same thing as estimating a mean value separately for each pixel <math>\textstyle x_j</math>.
-If you are training your algorithm on images other than natural images (for example, images of handwritten characters, or images of single isolated objects centered against a white background), other types of normalization might be worth considering, and the best choice may be application dependent. But when training on natural images, using the per-image mean normalization as the normalization equations above would be a reasonable default.
+If you are training your algorithm on images other than natural images (for example, images of handwritten characters, or images of single isolated objects centered against a white background), other types of normalization might be worth considering, and the best choice may be application dependent. But when training on natural images, using the per-image mean normalization method as given in the equations above would be a reasonable default.
 == References ==
 http://cs229.stanford.edu
+{{PCA}}
+{{Languages|主成分分析|中文}}