Fine-tuning Stacked AEs
From Ufldl
for
Fine-tuning Stacked AEs
Jump to:
navigation
,
search
=== Introduction === Fine tuning is a strategy that is commonly found in deep learning. As such, it can also be used to greatly improve the performance of a stacked autoencoder. From a high level perspective, fine tuning treats all layers of a stacked autoencoder as a single model, so that in one iteration, we are improving upon all the weights in the stacked autoencoder. === General Strategy === Luckily, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth. Note: most stacked autoencoders don't go past five layers. === Recap of the Backpropagation Algorithm === For your convenience, the summary of the backpropagation algorithm using element wise notation is below: : 1. Perform a feedforward pass, computing the activations for layers <math>\textstyle L_2</math>, <math>\textstyle L_3</math>, up to the output layer <math>\textstyle L_{n_l}</math>, using the equations defining the forward propagation steps. : 2. For the output layer (layer <math>\textstyle n_l</math>), set ::<math>\begin{align} \delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)}) \end{align}</math> ::(When using softmax regression, the softmax layer has <math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the vector of conditional probabilities.) : 3. For <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> ::Set :::<math>\begin{align} \delta^{(l)} = \left((W^{(l)})^T \delta^{(l+1)}\right) \bullet f'(z^{(l)}) \end{align}</math> : 4. Compute the desired partial derivatives: ::<math>\begin{align} \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\ \nabla_{b^{(l)}} J(W,b;x,y) &= \delta^{(l+1)}. \end{align}</math> :<math>\begin{align} J(W,b) &= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right] \end{align}</math>
Template:CNN
(
view source
)
Template:Languages
(
view source
)
Template:Quote
(
view source
)
Return to
Fine-tuning Stacked AEs
.
Views
Page
Discussion
View source
History
Personal tools
Log in
ufldl resources
UFLDL Tutorial
Recommended Readings
wiki
Main page
Recent changes
Random page
Help
Search
Toolbox
What links here
Related changes
Special pages