Neural Networks

Revision as of 05:41, 26 February 2011 (view source)

Ang (Talk | contribs)

← Older edit

Revision as of 06:04, 26 February 2011 (view source)

Ang (Talk | contribs)

Newer edit →

Line 29:

Here are plots of the sigmoid and <math>\tanh</math> functions:

-

~~{{multiple image~~

+

-

~~| width = 400~~

+

-

~~| footer = Two cards used by football referees~~

+

-

~~| image1 =~~ Sigmoid_Function.png

+

[[Image:Sigmoid_Function.png|400px|center|Sigmoid activation function.]]

-

| ~~alt1 =~~ Sigmoid activation function

+

[[Image:Tanh_Function.png|400px|center|Tanh activation function.]]

-

| ~~caption1 = Sigmoid~~ activation function

+

-

~~| image2~~ = ~~Tanh_Function~~.~~png~~

+

The <math>\tanh(z)</math> function is a rescaled version of the sigmoid, and its output range is

-

~~| alt2~~ = ~~Tanh activation~~ function

+

<math>[-1,1]</math> instead of <math>[0,1]</math>.

-

| ~~caption2~~ = ~~Tanh~~ activation ~~function~~

+

-

}}

+

Note that unlike CS221 and (parts of) CS229, we are not using the convention

+

here of <math>x_0=1</math>. Instead, the intercept term is handled separately by the parameter <math>b</math>.

+

Finally, one identity that'll be useful later: If <math>f(z) = 1/(1+\exp(-z))</math> is the sigmoid

+

function, then its derivative is given by <math>f'(z) = f(z) (1-f(z))</math>.

+

(If <math>f</math> is the tanh function, then its derivative is given by

+

<math>f'(z) = 1- (f(z))^2</math>.) You can derive this yourself using the definition of

+

the sigmoid (or tanh) function.

+

== Neural Network formulation ==

+

A neural network is put together by hooking together many of our simple

+

``neurons,'' so that the output of a neuron can be the input of another. For

+

example, here is a small neural network:

+

[[Image:Network331.png|400px|center]]

+

In this figure, we have used circles to also denote the inputs to the network. The circles

+

labeled ``+1'' are called {\bf bias units}, and correspond to the intercept term.

+

The leftmost layer of the network is called the {\bf input layer}, and the

+

rightmost layer the {\bf output layer} (which, in this example, has only one

+

node). The middle layer of nodes is called the {\bf hidden layer}, because its

+

values are not observed in the training set. We also say that our example

+

neural network has 3 {\bf input units} (not counting the bias unit), 3 {\bf

+

hidden units}, and 1 {\bf output unit}.

+

We will let <math>n_l</math>

+

denote the number of layers in our network; thus <math>n_l=3</math> in our example. We label layer <math>l</math> as

+

<math>L_l</math>, so layer <math>L_1</math> is the input layer, and layer <math>L_{n_l}</math> the output layer.

+

Our neural network has parameters <math>(W,b) = (W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)})</math>, where

+

we write

+

<math>W^{(l)}_{ij}</math> to denote the parameter (or weight) associated with the connection

+

between unit <math>j</math> in layer <math>l</math>, and unit <math>i</math> in layer <math>l+1</math>. (Note the order of the indices.)

+

Also, <math>b^{(l)}_i</math> is the bias associated with unit <math>i</math> in layer <math>l+1</math>.

+

Thus, in our example, we have <math>W^{(1)} \in \Re^{3\times 3}</math>, and <math>W^{(2)} \in \Re^{1\times 3}</math>.

+

Note that bias units don't have inputs or connections going into them, since they always output

+

the value +1. We also let <math>s_l</math> denote the number of nodes in layer <math>l</math> (not counting the bias unit).

+

We will write <math>a^{(l)}_i</math> to denote the {\bf activation} (meaning output value) of

+

unit <math>i</math> in layer <math>l</math>. For <math>l=1</math>, we also use <math>a^{(1)}_i = x_i</math> to denote the <math>i</math>-th input.

+

Given a fixed setting of

+

the parameters <math>W,b</math>, our neural

+

network defines a hypothesis <math>h_{W,b}(x)</math> that outputs a real number. Specifically, the

+

computation that this neural network represents is given by:

+

:<math>

+

\begin{align}

+

a_1^{(2)} &= f(W_{11}^{(1)}x_1 + W_{12}^{(1)} x_2 + W_{13}^{(1)} x_3 + b_1^{(1)}) \\

+

a_2^{(2)} &= f(W_{21}^{(1)}x_1 + W_{22}^{(1)} x_2 + W_{23}^{(1)} x_3 + b_2^{(1)}) \\

+

a_3^{(2)} &= f(W_{31}^{(1)}x_1 + W_{32}^{(1)} x_2 + W_{33}^{(1)} x_3 + b_3^{(1)}) \\

+

h_{W,b}(x) &= a_1^{(3)} = f(W_{11}^{(2)}a_1^{(2)} + W_{12}^{(2)} a_2^{(2)} + W_{13}^{(2)} a_3^{(2)} + b_1^{(2)})

+

\end{align}

+

</math>

+

In the sequel, we also let <math>z^{(l)}_i</math> denote the total weighted sum of inputs to unit <math>i</math> in layer <math>l</math>,

+

including the bias term (e.g., <math>z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i</math>), so that

+

<math>a^{(l)}_i = f(z^{(l)}_i)</math>.

Neural Networks

From Ufldl

Revision as of 06:04, 26 February 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 29: / Line 29: @@
 Here are plots of the sigmoid and <math>\tanh</math> functions:
-  {{multiple image
-   | width     = 400
-   | footer    = Two cards used by football referees
-   | image1    = Sigmoid_Function.png
+[[Image:Sigmoid_Function.png|400px|center|Sigmoid activation function.]]
-   | alt1      = Sigmoid activation function
+[[Image:Tanh_Function.png|400px|center|Tanh activation function.]]
-   | caption1  = Sigmoid activation function
-   | image2    = Tanh_Function.png
+The <math>\tanh(z)</math> function is a rescaled version of the sigmoid, and its output range is
-   | alt2      = Tanh activation function
+<math>[-1,1]</math> instead of <math>[0,1]</math>.
-   | caption2  = Tanh activation function
-  }}
+Note that unlike CS221 and (parts of) CS229, we are not using the convention
+here of <math>x_0=1</math>.  Instead, the intercept term is handled separately by the parameter <math>b</math>.
+Finally, one identity that'll be useful later: If <math>f(z) = 1/(1+\exp(-z))</math> is the sigmoid
+function, then its derivative is given by <math>f'(z) = f(z) (1-f(z))</math>.
+(If <math>f</math> is the tanh function, then its derivative is given by
+<math>f'(z) = 1- (f(z))^2</math>.)  You can derive this yourself using the definition of
+the sigmoid (or tanh) function.
+== Neural Network formulation ==
+A neural network is put together by hooking together many of our simple
+``neurons,'' so that the output of a neuron can be the input of another.  For
+example, here is a small neural network:
+[[Image:Network331.png|400px|center]]
+In this figure, we have used circles to also denote the inputs to the network.  The circles
+labeled ``+1'' are called {\bf bias units}, and correspond to the intercept term.
+The leftmost layer of the network is called the {\bf input layer}, and the
+rightmost layer the {\bf output layer} (which, in this example, has only one
+node).  The middle layer of nodes is called the {\bf hidden layer}, because its
+values are not observed in the training set.  We also say that our example
+neural network has 3 {\bf input units} (not counting the bias unit), 3 {\bf
+hidden units}, and 1 {\bf output unit}.
+We will let <math>n_l</math>
+denote the number of layers in our network; thus <math>n_l=3</math> in our example.  We label layer <math>l</math> as
+<math>L_l</math>, so layer <math>L_1</math> is the input layer, and layer <math>L_{n_l}</math> the output layer.
+Our neural network has parameters <math>(W,b) = (W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)})</math>, where
+we write
+<math>W^{(l)}_{ij}</math> to denote the parameter (or weight) associated with the connection
+between unit <math>j</math> in layer <math>l</math>, and unit <math>i</math> in layer <math>l+1</math>.  (Note the order of the indices.)
+Also, <math>b^{(l)}_i</math> is the bias associated with unit <math>i</math> in layer <math>l+1</math>.
+Thus, in our example, we have <math>W^{(1)} \in \Re^{3\times 3}</math>, and <math>W^{(2)} \in \Re^{1\times 3}</math>.
+Note that bias units don't have inputs or connections going into them, since they always output
+the value +1.  We also let <math>s_l</math> denote the number of nodes in layer <math>l</math> (not counting the bias unit).
+We will write <math>a^{(l)}_i</math> to denote the {\bf activation} (meaning output value) of
+unit <math>i</math> in layer <math>l</math>.  For <math>l=1</math>, we also use <math>a^{(1)}_i = x_i</math> to denote the <math>i</math>-th input.
+Given a fixed setting of
+the parameters <math>W,b</math>, our neural
+network defines a hypothesis <math>h_{W,b}(x)</math> that outputs a real number.  Specifically, the
+computation that this neural network represents is given by:
+:<math>
+\begin{align}
+a_1^{(2)} &= f(W_{11}^{(1)}x_1 + W_{12}^{(1)} x_2 + W_{13}^{(1)} x_3 + b_1^{(1)})  \\
+a_2^{(2)} &= f(W_{21}^{(1)}x_1 + W_{22}^{(1)} x_2 + W_{23}^{(1)} x_3 + b_2^{(1)})  \\
+a_3^{(2)} &= f(W_{31}^{(1)}x_1 + W_{32}^{(1)} x_2 + W_{33}^{(1)} x_3 + b_3^{(1)})  \\
+h_{W,b}(x) &= a_1^{(3)} =  f(W_{11}^{(2)}a_1^{(2)} + W_{12}^{(2)} a_2^{(2)} + W_{13}^{(2)} a_3^{(2)} + b_1^{(2)})
+\end{align}
+</math>
+In the sequel, we also let <math>z^{(l)}_i</math> denote the total weighted sum of inputs to unit <math>i</math> in layer <math>l</math>,
+including the bias term (e.g., <math>z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i</math>), so that
+<math>a^{(l)}_i = f(z^{(l)}_i)</math>.