Linear Decoders
From Ufldl
Line 21: | Line 21: | ||
While some datasets like MNIST fit well with this scaling of the output, this can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is | While some datasets like MNIST fit well with this scaling of the output, this can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is | ||
no longer constrained to <math>[0,1]</math> and it's not clear what the best way is to scale the data to ensure it fits into the constrained range. | no longer constrained to <math>[0,1]</math> and it's not clear what the best way is to scale the data to ensure it fits into the constrained range. | ||
+ | |||
+ | |||
+ | == Linear Decoder == | ||
One easy fix for this problem is to set <math>a^{(3)} = z^{(3)}</math>. Formally, this is achieved by having the output | One easy fix for this problem is to set <math>a^{(3)} = z^{(3)}</math>. Formally, this is achieved by having the output | ||
Line 47: | Line 50: | ||
\begin{align} | \begin{align} | ||
\delta_i^{(3)} = - (y_i - \hat{x}_i) | \delta_i^{(3)} = - (y_i - \hat{x}_i) | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
\end{align} | \end{align} | ||
</math> | </math> | ||
- | |||
Of course, when using backpropagation to compute the error terms for the ''hidden'' layer: | Of course, when using backpropagation to compute the error terms for the ''hidden'' layer: | ||
Line 66: | Line 61: | ||
Because the hidden layer is using a sigmoid (or tanh) activation <math>f</math>, in the equation above <math>f'(\cdot)</math> should still be the | Because the hidden layer is using a sigmoid (or tanh) activation <math>f</math>, in the equation above <math>f'(\cdot)</math> should still be the | ||
derivative of the sigmoid (or tanh) function. | derivative of the sigmoid (or tanh) function. | ||
+ | |||
+ | |||
+ | {{Languages|线性解码器|中文}} |