# Fine-tuning Stacked AEs

Jump to: navigation, search
 Revision as of 00:29, 12 May 2011 (view source)Jngiam (Talk | contribs) (→Finetuning for Classification Error)← Older edit Revision as of 03:22, 12 May 2011 (view source)Jngiam (Talk | contribs) (→Finetuning with Backpropagation)Newer edit → Line 34: Line 34: {{Quote| {{Quote| - Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using $\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})$. + Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using $\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})$, where  $\nabla J = \theta^T(I-P)$. }} }}