Self-Taught Learning to Deep Networks

From Ufldl

Jump to: navigation, search

@@ Line 1: / Line 1: @@
-== From Self-Taught Learning to Deep Networks ==
+== Overview ==
-Recall that in self-taught learning, we first train a sparse
+In this section, we will improve upon the features learned from self-taught learning by ''finetuning'' them for our classification objective.
-autoencoder on our unlabeled data.  Then, given a new example <math>\textstyle x</math>, we can use the
+Recall that in self-taught learning, we first train a sparse autoencoder on our unlabeled data.  Then, given a new example <math>\textstyle x</math>, we can use the
 hidden layer to extract features <math>\textstyle a</math>.  This is shown as follows:
-[[File:STL_SparseAE_Features.png|300px]]
+[[File:STL_SparseAE_Features.png|200px]]
 Now, we are interested in solving a classification task, where our goal is to
@@ Line 41: / Line 42: @@
 we can now further perform gradient descent from the current value of
 the weights to try to further drive down training error.
-===Discussion===
-Given that the whole algorithm is just a big neural network, why don't we just
-carry out the fine-tuning step, without doing any pre-training/unsupervised
-feature learning?  There are several reasons:
-<ul>
-<li> First and most important, labeled data is often scarce, and unlabeled
-data is cheap and plentiful.  The promise of self-taught learning is that by
-exploiting the massive amount of unlabeled data, we can learn much better
-models.  The fine-tuning step can be done only using labeled data.  In
-contrast, by using unlabeled data to learn a good initial value for the
-first layer of weights <math>\textstyle W^{(1)}</math>, we usually get much better classifiers
-after fine-tuning.
-<li> Second, training a neural network using supervised learning involves
-solving a highly non-convex optimization problem (say, minimizing the training
-error <math>\textstyle \sum_i ||h_W(x^{(i)}) - y^{(i)}||^2</math> as a function of the network parameters
-<math>\textstyle W</math>).
-The optimization problem can therefore be rife with local optima, and training
-with gradient descent (or methods like conjugate gradient and L-BFGS) do not
-work well.  In contrast, by first initializing the parameters using an
-unsupervised feature learning/pre-training step, we can end up at much better
-solutions. (Actually, pre-training has benefits beyond just helping to
-get out of local optima.  In particular, it has been shown to also have
-a useful "regularization" effect. (Erhan et al., 2010) But a full discussion
-is beyond the scope of these notes)
-</ul>

Self-Taught Learning to Deep Networks

From Ufldl

Revision as of 00:04, 12 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox