Self-Taught Learning to Deep Networks
From Ufldl
Line 1: | Line 1: | ||
- | |||
- | |||
In the previous section, you used an autoencoder to learn features that were then fed as input | In the previous section, you used an autoencoder to learn features that were then fed as input | ||
- | to a softmax or logistic regression classifier. | + | to a softmax or logistic regression classifier. In that method, the features were learned using |
- | only unlabeled data. In this section, we | + | only unlabeled data. In this section, we describe how you can '''fine-tune''' and further improve |
- | the learned features using | + | the learned features using labeled data. When you have a large amount of labeled |
training data, this can significantly improve your classifier's performance. | training data, this can significantly improve your classifier's performance. | ||
Line 11: | Line 9: | ||
features <math>\textstyle a</math>. This is illustrated in the following diagram: | features <math>\textstyle a</math>. This is illustrated in the following diagram: | ||
- | [[File:STL_SparseAE_Features.png| | + | [[File:STL_SparseAE_Features.png|300px]] |
We are interested in solving a classification task, where our goal is to | We are interested in solving a classification task, where our goal is to | ||
Line 22: | Line 20: | ||
To illustrate this step, similar to [[Neural Networks|our earlier notes]], we can draw our logistic regression unit (shown in orange) as follows: | To illustrate this step, similar to [[Neural Networks|our earlier notes]], we can draw our logistic regression unit (shown in orange) as follows: | ||
- | [[File:STL_Logistic_Classifier.png| | + | ::::[[File:STL_Logistic_Classifier.png|380px]] |
Now, consider the overall classifier (i.e., the input-output mapping) that we have learned | Now, consider the overall classifier (i.e., the input-output mapping) that we have learned | ||
Line 36: | Line 34: | ||
mapping from the input <math>\textstyle x</math> to the hidden unit activations <math>\textstyle a</math> were trained | mapping from the input <math>\textstyle x</math> to the hidden unit activations <math>\textstyle a</math> were trained | ||
as part of the sparse autoencoder training process. The second layer | as part of the sparse autoencoder training process. The second layer | ||
- | of weights <math>\textstyle W^{(2)}</math> mapping from the activations to the output <math>\textstyle y</math> was | + | of weights <math>\textstyle W^{(2)}</math> mapping from the activations <math>\textstyle a</math> to the output <math>\textstyle y</math> was |
- | trained using logistic regression (or softmax regression). | + | trained using logistic regression (or softmax regression). |
But the form of our overall/final classifier is clearly just a whole big neural network. So, | But the form of our overall/final classifier is clearly just a whole big neural network. So, | ||
Line 49: | Line 47: | ||
When fine-tuning is used, sometimes the original unsupervised feature learning steps | When fine-tuning is used, sometimes the original unsupervised feature learning steps | ||
- | (i.e., training the autoencoder and the logistic classifier) are | + | (i.e., training the autoencoder and the logistic classifier) are called '''pre-training.''' |
The effect of fine-tuning is that the labeled data can be used to modify the weights <math>W^{(1)}</math> as | The effect of fine-tuning is that the labeled data can be used to modify the weights <math>W^{(1)}</math> as | ||
well, so that adjustments can be made to the features <math>a</math> extracted by the layer | well, so that adjustments can be made to the features <math>a</math> extracted by the layer | ||
- | of hidden units. | + | of hidden units. |
So far, we have described this process assuming that you used the "replacement" representation, where | So far, we have described this process assuming that you used the "replacement" representation, where | ||
the training examples seen by the logistic classifier are of the form <math>(a^{(i)}, y^{(i)})</math>, | the training examples seen by the logistic classifier are of the form <math>(a^{(i)}, y^{(i)})</math>, | ||
rather than the "concatenation" representation, where the examples are of the form <math>((x^{(i)}, a^{(i)}), y^{(i)})</math>. | rather than the "concatenation" representation, where the examples are of the form <math>((x^{(i)}, a^{(i)}), y^{(i)})</math>. | ||
- | It is also possible to perform fine-tuning too using the "concatenation" representation | + | It is also possible to perform fine-tuning too using the "concatenation" representation. (This corresponds |
to a neural network where the input units <math>x_i</math> also feed directly to the logistic | to a neural network where the input units <math>x_i</math> also feed directly to the logistic | ||
- | classifier in the output layer. | + | classifier in the output layer. You can draw this using a slightly different type of neural network |
diagram than the ones we have seen so far; in particular, you would have edges that go directly | diagram than the ones we have seen so far; in particular, you would have edges that go directly | ||
from the first layer input nodes to the third layer output node, "skipping over" the hidden layer.) | from the first layer input nodes to the third layer output node, "skipping over" the hidden layer.) | ||
- | However, so long as we are using finetuning, usually the "concatenation" representation | + | However, so long as we are using finetuning, usually the "concatenation" representation |
- | has little advantage over the "replacement" representation. Thus, if we are using fine-tuning | + | has little advantage over the "replacement" representation. Thus, if we are using fine-tuning usually we will do so |
- | + | with a network built using the replacement representation. (If you are not using fine-tuning however, | |
- | with a network built using the replacement representation. | + | then sometimes the concatenation representation can give much better performance.) |
+ | |||
+ | When should we use fine-tuning? It is typically used only if you have a large labeled training | ||
+ | set; in this setting, fine-tuning can significantly improve the performance of your classifier. | ||
+ | However, if you | ||
+ | have a large ''unlabeled'' dataset (for unsupervised feature learning/pre-training) and | ||
+ | only a relatively small labeled training set, then fine-tuning is significantly less likely to | ||
+ | help. | ||
+ | |||
+ | |||
+ | {{CNN}} | ||
+ | |||
- | + | {{Languages|从自我学习到深层网络|中文}} | |
- | + | ||
- | + | ||
- | + |