Neural Networks
From Ufldl
Line 8: | Line 8: | ||
diagram to denote a single neuron: | diagram to denote a single neuron: | ||
- | [[Image:SingleNeuron.png| | + | [[Image:SingleNeuron.png|300px|center]] |
This "neuron" is a computational unit that takes as input <math>x_1, x_2, x_3</math> (and a +1 intercept term), and | This "neuron" is a computational unit that takes as input <math>x_1, x_2, x_3</math> (and a +1 intercept term), and | ||
Line 49: | Line 49: | ||
- | == Neural Network | + | == Neural Network model == |
- | + | ||
A neural network is put together by hooking together many of our simple | A neural network is put together by hooking together many of our simple | ||
Line 97: | Line 96: | ||
including the bias term (e.g., <math>z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i</math>), so that | including the bias term (e.g., <math>z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i</math>), so that | ||
<math>a^{(l)}_i = f(z^{(l)}_i)</math>. | <math>a^{(l)}_i = f(z^{(l)}_i)</math>. | ||
+ | |||
+ | Note that this easily lends itself to a more compact notation. Specifically, if we extend the | ||
+ | activation function <math>f(\cdot)</math> | ||
+ | to apply to vectors in an element-wise fashion (i.e., | ||
+ | <math>f([z_1, z_2, z_3]) = [f(z_1), f(z_2), f(z_3)]</math>), then we can write | ||
+ | Equations~(\ref{eqn-network331a}-\ref{eqn-network331d}) more | ||
+ | compactly as: | ||
+ | :<math>\begin{align} | ||
+ | z^{(2)} &= W^{(1)} x + b^{(1)} \\ | ||
+ | a^{(2)} &= f(z^{(2)}) \\ | ||
+ | z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\ | ||
+ | h_{W,b}(x) &= a^{(3)} = f(z^{(3)}) | ||
+ | \end{align}</math> | ||
+ | More generally, recalling that we also use <math>a^{(1)} = x</math> to also denote the values from the input layer, | ||
+ | then given layer <math>l</math>'s activations <math>a^{(l)}</math>, we can compute layer <math>l+1</math>'s activations <math>a^{(l+1)}</math> as: | ||
+ | :<math>\begin{align} | ||
+ | z^{(l+1)} &= W^{(l)} a^{(l)} + b^{(l)} \\ | ||
+ | a^{(l+1)} &= f(z^{(l+1)}) | ||
+ | \end{align}</math> | ||
+ | By organizing our parameters in matrices and using matrix-vector operations, we can take | ||
+ | advantage of fast linear algebra routines to quickly perform calculations in our network. | ||
+ | |||
+ | We have so far focused on one example neural network, but one can also build neural | ||
+ | networks with other {\bf | ||
+ | architectures} (meaning patterns of connectivity between neurons), including ones with multiple hidden layers. | ||
+ | The most common choice is a <math>n_l</math>-layered network | ||
+ | where layer <math>1</math> is the input layer, layer <math>n_l</math> is the output layer, and each | ||
+ | layer <math>l</math> is densely connected to layer <math>l+1</math>. In this setting, to compute the | ||
+ | output of the network, we can successively compute all the activations in layer | ||
+ | <math>L_2</math>, then layer <math>L_3</math>, and so on, up to layer <math>L_{n_l}</math>, using Equations~(\ref{eqn-forwardprop1}-\ref{eqn-forwardprop2}). This is one | ||
+ | example of a {\bf feedforward} neural network, since the connectivity graph | ||
+ | does not have any directed loops or cycles. | ||
+ | %We will write <math>s_l</math> to denote the | ||
+ | %number of units in layer <math>l</math> of the network (not counting the bias unit). | ||
+ | |||
+ | Neural networks can also have multiple output units. For example, here is a network | ||
+ | with two hidden layers layers <math>L_2</math> and <math>L_3</math> and two output units in layer <math>L_4</math>: | ||
+ | |||
+ | [[Image:Network3322.png|500px|center]] | ||
+ | |||
+ | To train this network, we would need training examples <math>(x^{(i)}, y^{(i)})</math> | ||
+ | where <math>y^{(i)} \in \Re^2</math>. This sort of network is useful if there're multiple | ||
+ | outputs that you're interested in predicting. (For example, in a medical | ||
+ | diagnosis application, the vector <math>x</math> might give the input features of a | ||
+ | patient, and the different outputs <math>y_i</math>'s might indicate presence or absence | ||
+ | of different diseases.) |