Linear Decoders
From Ufldl
(→Sparse Autoencoder Recap) |
|||
Line 1: | Line 1: | ||
- | |||
== Sparse Autoencoder Recap == | == Sparse Autoencoder Recap == | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
In the sparse autoencoder, we had 3 layers of neurons: an input layer, a hidden layer and an output layer. In our previous description | In the sparse autoencoder, we had 3 layers of neurons: an input layer, a hidden layer and an output layer. In our previous description | ||
Line 16: | Line 5: | ||
In these notes, we describe a modified version of the autoencoder in which some of the neurons use a different activation function. | In these notes, we describe a modified version of the autoencoder in which some of the neurons use a different activation function. | ||
This will result in a model that is sometimes simpler to apply, and can also be more robust to variations in the parameters. | This will result in a model that is sometimes simpler to apply, and can also be more robust to variations in the parameters. | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
Recall that each neuron (in the output layer) computed the following: | Recall that each neuron (in the output layer) computed the following: | ||
Line 39: | Line 16: | ||
where <math>a^{(3)}</math> is the output. In the autoencoder, <math>a^{(3)}</math> is our approximate reconstruction of the input <math>x = a^{(1)}</math>. | where <math>a^{(3)}</math> is the output. In the autoencoder, <math>a^{(3)}</math> is our approximate reconstruction of the input <math>x = a^{(1)}</math>. | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
Because we used a sigmoid activation function for <math>f(z^{(3)})</math>, we needed to constrain or scale the inputs to be in the range <math>[0,1]</math>, | Because we used a sigmoid activation function for <math>f(z^{(3)})</math>, we needed to constrain or scale the inputs to be in the range <math>[0,1]</math>, | ||
Line 80: | Line 21: | ||
While some datasets like MNIST fit well with this scaling of the output, this can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is | While some datasets like MNIST fit well with this scaling of the output, this can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is | ||
no longer constrained to <math>[0,1]</math> and it's not clear what the best way is to scale the data to ensure it fits into the constrained range. | no longer constrained to <math>[0,1]</math> and it's not clear what the best way is to scale the data to ensure it fits into the constrained range. | ||
+ | |||
== Linear Decoder == | == Linear Decoder == | ||
Line 119: | Line 61: | ||
Because the hidden layer is using a sigmoid (or tanh) activation <math>f</math>, in the equation above <math>f'(\cdot)</math> should still be the | Because the hidden layer is using a sigmoid (or tanh) activation <math>f</math>, in the equation above <math>f'(\cdot)</math> should still be the | ||
derivative of the sigmoid (or tanh) function. | derivative of the sigmoid (or tanh) function. | ||
+ | |||
+ | |||
+ | {{Languages|线性解码器|中文}} |