Backpropagation Algorithm
From Ufldl
Line 25: | Line 25: | ||
as reflected in our definition for <math>J(W, b)</math>. Applying weight decay | as reflected in our definition for <math>J(W, b)</math>. Applying weight decay | ||
to the bias units usually makes only a small different to the final network, | to the bias units usually makes only a small different to the final network, | ||
- | however. If you | + | however. If you've taken CS229 (Machine Learning) at Stanford or watched the course's videos |
+ | on YouTube, you may also recognize weight decay this as | ||
essentially a variant of the Bayesian regularization method you saw there, | essentially a variant of the Bayesian regularization method you saw there, | ||
where we placed a Gaussian prior on the parameters and did MAP (instead of | where we placed a Gaussian prior on the parameters and did MAP (instead of | ||
maximum likelihood) estimation.] | maximum likelihood) estimation.] | ||
- | |||
The '''weight decay parameter''' <math>\lambda</math> controls the relative importance | The '''weight decay parameter''' <math>\lambda</math> controls the relative importance | ||
Line 115: | Line 115: | ||
the algorithm using matrix-vectorial notation. | the algorithm using matrix-vectorial notation. | ||
We will use "<math>\textstyle \bullet</math>" to denote the element-wise product | We will use "<math>\textstyle \bullet</math>" to denote the element-wise product | ||
- | operator (denoted `` | + | operator (denoted ``<tt>.*</tt>'' in Matlab or Octave, and also called the Hadamard product), |
so | so | ||
that if <math>\textstyle a = b \bullet c</math>, then <math>\textstyle a_i = b_ic_i</math>. | that if <math>\textstyle a = b \bullet c</math>, then <math>\textstyle a_i = b_ic_i</math>. |