Self-Taught Learning to Deep Networks

From Ufldl

Jump to: navigation, search

@@ Line 1: / Line 1: @@
-== Overview ==
 In the previous section, you used an autoencoder to learn features that were then fed as input
-to a softmax or logistic regression classifier.  There, the features were learned using
+to a softmax or logistic regression classifier.  In that method, the features were learned using
-only unlabeled data.  In this section, we show how you can  '''fine-tune''' or further improve
+only unlabeled data.  In this section, we describe how you can  '''fine-tune''' and further improve
-the learned features using the labeled data.  When you have a large amount of labeled
+the learned features using labeled data.  When you have a large amount of labeled
 training data, this can significantly improve your classifier's performance.
@@ Line 11: / Line 9: @@
 features <math>\textstyle a</math>.  This is illustrated in the following diagram:
-[[File:STL_SparseAE_Features.png|200px]]
+[[File:STL_SparseAE_Features.png|300px]]
 We are interested in solving a classification task, where our goal is to
@@ Line 22: / Line 20: @@
 To illustrate this step, similar to [[Neural Networks|our earlier notes]], we can draw our logistic regression unit (shown in orange) as follows:
-[[File:STL_Logistic_Classifier.png|400px]]
+::::[[File:STL_Logistic_Classifier.png|380px]]
 Now, consider the overall classifier (i.e., the input-output mapping) that we have learned
@@ Line 36: / Line 34: @@
 mapping from the input <math>\textstyle x</math> to the hidden unit activations <math>\textstyle a</math> were trained
 as part of the sparse autoencoder training process.  The second layer
-of weights <math>\textstyle W^{(2)}</math> mapping from the activations to the output <math>\textstyle y</math> was
+of weights <math>\textstyle W^{(2)}</math> mapping from the activations <math>\textstyle a</math> to the output <math>\textstyle y</math> was
 trained using logistic regression (or softmax regression).
 But the form of our overall/final classifier is clearly just a whole big neural network.  So,
@@ Line 49: / Line 47: @@
 When fine-tuning is used, sometimes the original unsupervised feature learning steps
-(i.e., training the autoencoder and the logistic classifier) are also called '''pre-training.'''
+(i.e., training the autoencoder and the logistic classifier) are called '''pre-training.'''
 The effect of fine-tuning is that the labeled data can be used to modify the weights <math>W^{(1)}</math> as
 well, so that adjustments can be made to the features <math>a</math> extracted by the layer
 of hidden units.
 So far, we have described this process assuming that you used the "replacement" representation, where
 the training examples seen by the logistic classifier are of the form <math>(a^{(i)}, y^{(i)})</math>,
 rather than the "concatenation" representation, where the examples are of the form <math>((x^{(i)}, a^{(i)}), y^{(i)})</math>.
-It is also possible to perform fine-tuning too using the "concatenation" representation; this corresponds
+It is also possible to perform fine-tuning too using the "concatenation" representation.  (This corresponds
 to a neural network where the input units <math>x_i</math> also feed directly to the logistic
-classifier in the output layer.  (You can draw this using a slightly different type of neural network
+classifier in the output layer.  You can draw this using a slightly different type of neural network
 diagram than the ones we have seen so far; in particular, you would have edges that go directly
 from the first layer input nodes to the third layer output node, "skipping over" the hidden layer.)
-However, so long as we are using finetuning, usually the "concatenation" representation usually
+However, so long as we are using finetuning, usually the "concatenation" representation
-has little advantage over the "replacement" representation.  Thus, if we are using fine-tuning
+has little advantage over the "replacement" representation.  Thus, if we are using fine-tuning usually we will do so
-in our of unsupervised feature learning or self-taught learning application, usually we will do so
+with a network built using the replacement representation.  (If you are not using fine-tuning however,
-with a network built using the replacement representation.
+then sometimes the concatenation representation can give much better performance.)
+When should we use fine-tuning?  It is typically used only if you have a large labeled training
+set; in this setting, fine-tuning can significantly improve the performance of your classifier.
+However, if you
+have a large ''unlabeled'' dataset (for unsupervised feature learning/pre-training) and
+only a relatively small labeled training set, then fine-tuning is significantly less likely to
+help.
+{{CNN}}
-When should we use fine-tuning?  It is typically used only if you have a large labeled training set; in this
+{{Languages|从自我学习到深层网络|中文}}
-setting, fine-tuning can significantly improve the performance of your classifier.  If you
-have a large unlabeled dataset (for unsupervised feature learning/pre-training) and
-a relatively small labeled training set, then fine-tuning is less likely to help.

Self-Taught Learning to Deep Networks

From Ufldl

Latest revision as of 13:29, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox