Deep Networks: Overview
From Ufldl
(→Overview) |
|||
Line 2: | Line 2: | ||
In the previous sections, you constructed a 3-layer neural network comprising | In the previous sections, you constructed a 3-layer neural network comprising | ||
- | an input, hidden and output layer. While fairly effective for MNIST, | + | an input, hidden and output layer. While fairly effective for MNIST, this |
- | 3-layer | + | 3-layer model is a fairly '''shallow''' network; by this, we mean that the |
features (hidden layer activations <math>a^{(2)}</math>) are computed using | features (hidden layer activations <math>a^{(2)}</math>) are computed using | ||
only "one layer" of computation (the hidden layer). | only "one layer" of computation (the hidden layer). | ||
In this section, we begin to discuss '''deep''' neural networks, meaning ones | In this section, we begin to discuss '''deep''' neural networks, meaning ones | ||
- | in which we have multiple hidden layers | + | in which we have multiple hidden layers; this will allow us to compute much |
- | + | more complex features of the input. Because each hidden layer computes a | |
- | hidden layer computes a non-linear transformation of the previous layer | + | non-linear transformation of the previous layer, a deep network can have |
- | + | significantly greater representational power (i.e., can learn | |
- | + | significantly more complex functions) than a shallow one. | |
- | than | + | |
- | + | Note that when training a deep network, it is important to use a ''non-linear'' | |
activation function <math>f(\cdot)</math> in each hidden layer. This is | activation function <math>f(\cdot)</math> in each hidden layer. This is | ||
because multiple layers of linear functions would itself compute only a linear | because multiple layers of linear functions would itself compute only a linear |