Stacked Autoencoders

Revision as of 00:26, 12 May 2011 (view source)

Jngiam (Talk | contribs)

(→Overview)

← Older edit

Revision as of 00:28, 12 May 2011 (view source)

Jngiam (Talk | contribs)

(→Training)

Newer edit →

Line 28:

A good way to obtain good parameters for a stacked autoencoder is to use greedy layer-wise training. To do this, first train the first layer on raw input to obtain parameters W1, W2, b1 and b2. Use the first layer to transform the raw input into a vector consisting of activation of the hidden units, A. Train the second layer on this vector to obtain parameters W1, W2, b1 and b2. Repeat for subsequent layers, using the output of each layer as input for the subsequent layer.

-

This method trains the parameters of each layer individually while freezing parameters for the remainder of the model. To produce better results, after this phase of training is complete, fine-tuning can be used to improve the results~~. In fine-~~tuning, the parameters of all layers are changed at the same time. ~~The loss function can be back-propagated to each preceding layer, and the~~

+

This method trains the parameters of each layer individually while freezing parameters for the remainder of the model. To produce better results, after this phase of training is complete, fine-tuning using backpropagation can be used to improve the results by tuning the parameters of all layers are changed at the same time.

-

In practice, fine-tuning should be use when the parameters have been brought close to convergence through layer-wise training. Attempting to use fine-tuning with the weights initialized randomly will lead to poor results due to local optima.

+

+

{{Quote|

+

If one is only interested in finetuning for the purposes of classification, the common practice is to then discard the "decoding" layers of the stacked autoencoder and link the last hidden layer <math>a^(n)</math> to the softmax classifier. The gradients from the (softmax) classification error will then be backpropagated into the encoding layers.

+

}}

===Motivation===

Stacked Autoencoders

From Ufldl

Revision as of 00:28, 12 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 28: / Line 28: @@
 A good way to obtain good parameters for a stacked autoencoder is to use greedy layer-wise training. To do this, first train the first layer on raw input to obtain parameters W1, W2, b1 and b2. Use the first layer to transform the raw input into a vector consisting of activation of the hidden units, A. Train the second layer on this vector to obtain parameters W1, W2, b1 and b2. Repeat for subsequent layers, using the output of each layer as input for the subsequent layer.
-This method trains the parameters of each layer individually while freezing parameters for the remainder of the model. To produce better results, after this phase of training is complete, fine-tuning can be used to improve the results. In fine-tuning, the parameters of all layers are changed at the same time. The loss function can be back-propagated to each preceding layer, and the
+This method trains the parameters of each layer individually while freezing parameters for the remainder of the model. To produce better results, after this phase of training is complete, fine-tuning using backpropagation can be used to improve the results by tuning the parameters of all layers are changed at the same time.
-In practice, fine-tuning should be use when the parameters have been brought close to convergence through layer-wise training. Attempting to use fine-tuning with the weights initialized randomly will lead to poor results due to local optima.
+<!-- In practice, fine-tuning should be use when the parameters have been brought close to convergence through layer-wise training. Attempting to use fine-tuning with the weights initialized randomly will lead to poor results due to local optima. -->
+{{Quote|
+If one is only interested in finetuning for the purposes of classification, the common practice is to then discard the "decoding" layers of the stacked autoencoder and link the last hidden layer <math>a^(n)</math> to the softmax classifier. The gradients from the (softmax) classification error will then be backpropagated into the encoding layers.
+}}
 ===Motivation===