# Linear Decoders

 Revision as of 02:05, 22 May 2011 (view source)Jngiam (Talk | contribs) (→Linear Decoder)← Older edit Revision as of 02:06, 22 May 2011 (view source)Jngiam (Talk | contribs) (→Linear Decoder)Newer edit → Line 18: Line 18: == Linear Decoder == == Linear Decoder == - One easy fix for the fore-mentioned problem is to use a ''linear-decoder'', that is, we set $a^{(3)} = f(z^{(3)})$. + One easy fix for the fore-mentioned problem is to use a ''linear-decoder'', that is, we set $a^{(3)} = z^{(3)}$. For a linear decoder, the activation function of the output unit is effectively the identity function. Formally, to reconstruct the input from the features using a linear decoder, we simply set $\hat{x} = a^{(3)} = z^{(3)} = W^{(2)}a + b^{(2)}$ instead, without applying the sigmoid function. Now the reconstructed output $\hat{x}$ is a linear function of the activations of the hidden units, which means that by varying $W$, each output unit $\hat{x}$ can be made to produce any activation without the previous constraints. This allows us to train the sparse autoencoder on any input that takes on real values without any additional pre-processing. (Note that the hidden units are '''still sigmoid units''', that is, $a = \sigma(W^{(1)}*x + b^{(1)})$, where $x$ is the input, and $W^{(1)}$ and $b^{(1)}$ are the weight and bias terms for the hidden units) For a linear decoder, the activation function of the output unit is effectively the identity function. Formally, to reconstruct the input from the features using a linear decoder, we simply set $\hat{x} = a^{(3)} = z^{(3)} = W^{(2)}a + b^{(2)}$ instead, without applying the sigmoid function. Now the reconstructed output $\hat{x}$ is a linear function of the activations of the hidden units, which means that by varying $W$, each output unit $\hat{x}$ can be made to produce any activation without the previous constraints. This allows us to train the sparse autoencoder on any input that takes on real values without any additional pre-processing. (Note that the hidden units are '''still sigmoid units''', that is, $a = \sigma(W^{(1)}*x + b^{(1)})$, where $x$ is the input, and $W^{(1)}$ and $b^{(1)}$ are the weight and bias terms for the hidden units)