Self-Taught Learning to Deep Networks
From Ufldl
for
Self-Taught Learning to Deep Networks
Jump to:
navigation
,
search
== From Self-Taught Learning to Deep Networks == Recall that in self-taught learning, we first train a sparse autoencoder on our unlabeled data. Then, given a new example <math>\textstyle x</math>, we can use the hidden layer to extract features <math>\textstyle a</math>. This is shown as follows: [[File:STL_SparseAE_Features.png|300px]] Now, we are interested in solving a classification task, where our goal is to predict labels <math>\textstyle y</math>. We have a labeled training set <math>\textstyle \{ (x_l^{(1)}, y^{(1)}), (x_l^{(2)}, y^{(2)}), \ldots (x_l^{(m_l)}, y^{(m_l)}) \}</math> of <math>\textstyle m_l</math> examples. Suppose we replace the original features <math>\textstyle x^{(i)}</math> with features <math>\textstyle a^{(l)}</math> computed by the sparse autoencoder. This gives us a training set <math>\textstyle \{(a^{(2)}, y^{(2)}), \ldots (a^{(m_l)}, y^{(m_l)}) \}</math>. Finally, we train a logistic classifier to map from the features <math>\textstyle a^{(i)}</math> to the classification label <math>\textstyle y</math>. As before, we can draw our logistic unit (show in orange) as follows: [[File:STL_Logistic_Classifier.png|400px]] If we now look at the final classifier that we've learned, in terms of what function it computes given a new test example <math>\textstyle x</math>, we see that it can be drawn by putting the two pictures above together. In particular, the final classifier looks like this: [[File:STL_CombinedAE.png|500px]] This model was trained in two stages. The first layer of weights <math>\textstyle W^{(1)}</math> mapping from the input <math>\textstyle x</math> to the hidden unit activations <math>\textstyle a</math> were trained as part of the sparse autoencoder training process. The second layer of weights <math>\textstyle W^{(2)}</math> mapping from the activations to the output <math>\textstyle y</math> was trained using logistic regression. But the final algorithm is clearly just a whole big neural network. So, we can also carrying out further '''fine-tuning''' of the weights to improve the overall classifier's performance. In particular, having trained the first layer using an autoencoder and the second layer via logistic regression (this process is sometimes called '''pre-training''', and sometimes more generally unsupervised feature learning), we can now further perform gradient descent from the current value of the weights to try to further drive down training error. ===Discussion=== Given that the whole algorithm is just a big neural network, why don't we just carry out the fine-tuning step, without doing any pre-training/unsupervised feature learning? There are several reasons: <ul> <li> First and most important, labeled data is often scarce, and unlabeled data is cheap and plentiful. The promise of self-taught learning is that by exploiting the massive amount of unlabeled data, we can learn much better models. The fine-tuning step can be done only using labeled data. In contrast, by using unlabeled data to learn a good initial value for the first layer of weights <math>\textstyle W^{(1)}</math>, we usually get much better classifiers after fine-tuning. <li> Second, training a neural network using supervised learning involves solving a highly non-convex optimization problem (say, minimizing the training error <math>\textstyle \sum_i ||h_W(x^{(i)}) - y^{(i)}||^2</math> as a function of the network parameters <math>\textstyle W</math>). The optimization problem can therefore be rife with local optima, and training with gradient descent (or methods like conjugate gradient and L-BFGS) do not work well. In contrast, by first initializing the parameters using an unsupervised feature learning/pre-training step, we can end up at much better solutions. (Actually, pre-training has benefits beyond just helping to get out of local optima. In particular, it has been shown to also have a useful "regularization" effect. (Erhan et al., 2010) But a full discussion is beyond the scope of these notes) </ul>
Template:CNN
(
view source
)
Template:Languages
(
view source
)
Return to
Self-Taught Learning to Deep Networks
.
Views
Page
Discussion
View source
History
Personal tools
Log in
ufldl resources
UFLDL Tutorial
Recommended Readings
wiki
Main page
Recent changes
Random page
Help
Search
Toolbox
What links here
Related changes
Special pages