Self-Taught Learning

Revision as of 23:26, 10 May 2011 (view source)

Ang (Talk | contribs)

(→Learning features)

← Older edit

Latest revision as of 13:26, 7 April 2013 (view source)

Kandeng (Talk | contribs)

Line 44:

(perhaps with appropriate whitening or other pre-processing):

-

[[File:STL_SparseAE.png]]

+

[[File:STL_SparseAE.png|350px]]

Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math> of this model,

Line 53:

neural network:

-

[[File:STL_SparseAE_Features.png]]

+

[[File:STL_SparseAE_Features.png|300px]]

This is just the sparse autoencoder that we previously had, with with the final

Line 73:

\}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the

<math>\textstyle i</math>-th training example), or <math>\textstyle \{

-

((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots

+

((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,

((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated

representation). In practice, the concatenated representation often works

Line 91:

various pre-processing parameters. For example, one may have computed

a mean value of the data and subtracted off this mean to perform mean normalization,

-

or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or PCA

+

or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or used

+

PCA

whitening or ZCA whitening). If this is the case, then it is important to

save away these preprocessing parameters, and to use the ''same'' parameters

Line 102:

Line 103:

labeled training set, since that might result in a dramatically different

pre-processing transformation, which would make the input distribution to

-

the autoencoder very different from what it was actually trained on.

+

the autoencoder very different from what it was actually trained on.

== On the terminology of unsupervised feature learning ==

There are two common unsupervised feature learning settings, depending on what type of

-

unlabeled data you have. The more powerful setting is the '''self-taught learning'''

+

unlabeled data you have. The more general and powerful setting is the '''self-taught learning'''

setting, which does not assume that your unlabeled data <math>x_u</math> has to

be drawn from the same distribution as your labeled data <math>x_l</math>. The

Line 130:

Line 131:

ones are motorcycles), then we could use this form of unlabeled data to

learn the features. This setting---where each unlabeled example is drawn from the same

-

distribution as your labeled examples---is sometimes called the ~~'''~~semi-supervised~~'''~~

+

distribution as your labeled examples---is sometimes called the semi-supervised

-

setting. In practice, we ~~rarely~~ have this sort of unlabeled data (where would you

+

setting. In practice, we often do not have this sort of unlabeled data (where would you

get a database of images where every image is either a car or a motorcycle, but

just missing its label?), and so in the context of learning features from unlabeled

-

data, the self-taught learning setting is ~~much~~ more broadly applicable.

+

data, the self-taught learning setting is more broadly applicable.

+

Self-Taught Learning

From Ufldl

Latest revision as of 13:26, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 44: / Line 44: @@
 (perhaps with appropriate whitening or other pre-processing):
-[[File:STL_SparseAE.png]]
+[[File:STL_SparseAE.png|350px]]
 Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math> of this model,
@@ Line 53: / Line 53: @@
 neural network:
-[[File:STL_SparseAE_Features.png]]
+[[File:STL_SparseAE_Features.png|300px]]
 This is just the sparse autoencoder that we previously had, with with the final
@@ Line 73: / Line 73: @@
 \}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the
 <math>\textstyle i</math>-th training example), or <math>\textstyle \{
-((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots
+((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,
 ((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated
 representation).  In practice, the concatenated representation often works
@@ Line 91: / Line 91: @@
 various pre-processing parameters.  For example, one may have computed
 a mean value of the data and subtracted off this mean to perform mean normalization,
-or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or PCA
+or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or used
+PCA
 whitening or ZCA whitening).  If this is the case, then it is important to
 save away these preprocessing parameters, and to use the ''same'' parameters
@@ Line 102: / Line 103: @@
 labeled training set, since that might result in a dramatically different
 pre-processing transformation, which would make the input distribution to
 the autoencoder very different from what it was actually trained on.
 == On the terminology of unsupervised feature learning ==
 There are two common unsupervised feature learning settings, depending on what type of
-unlabeled data you have.  The more powerful setting is the '''self-taught learning'''
+unlabeled data you have.  The more general and powerful setting is the '''self-taught learning'''
 setting, which does not assume that your unlabeled data <math>x_u</math> has to
 be drawn from the same distribution as your labeled data <math>x_l</math>.  The
@@ Line 130: / Line 131: @@
 ones are motorcycles), then we could use this form of unlabeled data to
 learn the features.  This setting---where each unlabeled example is drawn from the same
-distribution as your labeled examples---is sometimes called the '''semi-supervised'''
+distribution as your labeled examples---is sometimes called the semi-supervised
-setting.  In practice, we rarely have this sort of unlabeled data (where would you
+setting.  In practice, we often do not have this sort of unlabeled data (where would you
 get a database of images where every image is either a car or a motorcycle, but
 just missing its label?), and so in the context of learning features from unlabeled
-data, the self-taught learning setting is much more broadly applicable.
+data, the self-taught learning setting is more broadly applicable.
+{{STL}}
+{{Languages|自我学习|中文}}