Fine-tuning Stacked AEs

Revision as of 00:41, 22 April 2011 (view source)

Watsuen (Talk | contribs)

(→Introduction)

← Older edit

Latest revision as of 04:04, 8 April 2013 (view source)

Kandeng (Talk | contribs)

Line 3:

=== General Strategy ===

-

~~Luckily~~, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.

+

Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.

-

~~As a note, most stacked autoencoders don't go past five layers.~~

+

=== Finetuning with Backpropagation ===

-

+

-

=== ~~Recap of the~~ Backpropagation ~~Algorithm~~ ===

+

For your convenience, the summary of the backpropagation algorithm using element wise notation is below:

Line 14:

Line 12:

::<math>\begin{align}

\delta^{(n_l)}

-

= - (~~y -~~ a^{(n_l)}) \bullet f'(z^{(n_l)})

+

= - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})

\end{align}</math>

+

::(When using softmax regression, the softmax layer has <math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the vector of conditional probabilities.)

: 3. For <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>

::Set

Line 31:

Line 30:

&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]

\end{align}</math>

+

{{Quote|

+

Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>, where <math>\nabla J = \theta^T(I-P)</math>.

+

}}

+

Fine-tuning Stacked AEs

From Ufldl

Latest revision as of 04:04, 8 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 3: / Line 3: @@
 === General Strategy ===
-Luckily, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.
+Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.
-As a note, most stacked autoencoders don't go past five layers.
+=== Finetuning with Backpropagation ===
-=== Recap of the Backpropagation Algorithm ===
 For your convenience, the summary of the backpropagation algorithm using element wise notation is below:
@@ Line 14: / Line 12: @@
 ::<math>\begin{align}
 \delta^{(n_l)}
-= - (y - a^{(n_l)}) \bullet f'(z^{(n_l)})
+= - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})
 \end{align}</math>
+::(When using softmax regression, the softmax layer has <math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the vector of conditional probabilities.)
 : 3. For <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>
 ::Set
@@ Line 31: / Line 30: @@
 &= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]
 \end{align}</math>
+{{Quote|
+Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>, where  <math>\nabla J = \theta^T(I-P)</math>.
+}}
+{{CNN}}
+{{Languages|微调多层自编码算法|中文}}