Self-Taught Learning to Deep Networks
From Ufldl
(→Overview) |
|||
Line 36: | Line 36: | ||
mapping from the input <math>\textstyle x</math> to the hidden unit activations <math>\textstyle a</math> were trained | mapping from the input <math>\textstyle x</math> to the hidden unit activations <math>\textstyle a</math> were trained | ||
as part of the sparse autoencoder training process. The second layer | as part of the sparse autoencoder training process. The second layer | ||
- | of weights <math>\textstyle W^{(2)}</math> mapping from the activations to the output <math>\textstyle y</math> was | + | of weights <math>\textstyle W^{(2)}</math> mapping from the activations <math>\textstyle a</math> to the output <math>\textstyle y</math> was |
trained using logistic regression (or softmax regression). | trained using logistic regression (or softmax regression). | ||
Line 49: | Line 49: | ||
When fine-tuning is used, sometimes the original unsupervised feature learning steps | When fine-tuning is used, sometimes the original unsupervised feature learning steps | ||
- | (i.e., training the autoencoder and the logistic classifier) are | + | (i.e., training the autoencoder and the logistic classifier) are called '''pre-training.''' |
The effect of fine-tuning is that the labeled data can be used to modify the weights <math>W^{(1)}</math> as | The effect of fine-tuning is that the labeled data can be used to modify the weights <math>W^{(1)}</math> as | ||
well, so that adjustments can be made to the features <math>a</math> extracted by the layer | well, so that adjustments can be made to the features <math>a</math> extracted by the layer | ||
Line 57: | Line 57: | ||
the training examples seen by the logistic classifier are of the form <math>(a^{(i)}, y^{(i)})</math>, | the training examples seen by the logistic classifier are of the form <math>(a^{(i)}, y^{(i)})</math>, | ||
rather than the "concatenation" representation, where the examples are of the form <math>((x^{(i)}, a^{(i)}), y^{(i)})</math>. | rather than the "concatenation" representation, where the examples are of the form <math>((x^{(i)}, a^{(i)}), y^{(i)})</math>. | ||
- | It is also possible to perform fine-tuning too using the "concatenation" representation | + | It is also possible to perform fine-tuning too using the "concatenation" representation. (This corresponds |
to a neural network where the input units <math>x_i</math> also feed directly to the logistic | to a neural network where the input units <math>x_i</math> also feed directly to the logistic | ||
- | classifier in the output layer. | + | classifier in the output layer. You can draw this using a slightly different type of neural network |
diagram than the ones we have seen so far; in particular, you would have edges that go directly | diagram than the ones we have seen so far; in particular, you would have edges that go directly | ||
from the first layer input nodes to the third layer output node, "skipping over" the hidden layer.) | from the first layer input nodes to the third layer output node, "skipping over" the hidden layer.) | ||
- | However, so long as we are using finetuning, usually the "concatenation" representation | + | However, so long as we are using finetuning, usually the "concatenation" representation |
- | has little advantage over the "replacement" representation. Thus, if we are using fine-tuning | + | has little advantage over the "replacement" representation. Thus, if we are using fine-tuning usually we will do so |
- | + | with a network built using the replacement representation. (If you are not using fine-tuning however, | |
- | with a network built using the replacement representation. | + | then sometimes the concatenation representation can give much better performance.) |
- | When should we use fine-tuning? It is typically used only if you have a large labeled training set; in this | + | When should we use fine-tuning? It is typically used only if you have a large labeled training |
- | setting, fine-tuning can significantly improve the performance of your classifier. | + | set; in this setting, fine-tuning can significantly improve the performance of your classifier. |
- | have a large unlabeled dataset (for unsupervised feature learning/pre-training) and | + | However, if you |
- | a relatively small labeled training set, then fine-tuning is less likely to help. | + | have a large ''unlabeled'' dataset (for unsupervised feature learning/pre-training) and |
+ | only a relatively small labeled training set, then fine-tuning is significantly less likely to | ||
+ | help. |