Gradient checking and advanced optimization
From Ufldl
(Created page with "Backpropagation is a notoriously difficult algorithm to debug and get right, especially since many subtly buggy implementations of it---for example, one that has an off-by-one er...") |
m (Fixed quotation marks) |
||
Line 34: | Line 34: | ||
In practice, we set {\rm EPSILON} to a small constant, say around <math>\textstyle 10^{-4}</math>. | In practice, we set {\rm EPSILON} to a small constant, say around <math>\textstyle 10^{-4}</math>. | ||
(There's a large range of values of {\rm EPSILON} that should work well, but | (There's a large range of values of {\rm EPSILON} that should work well, but | ||
- | we don't set {\rm EPSILON} to be | + | we don't set {\rm EPSILON} to be "extremely" small, say <math>\textstyle 10^{-20}</math>, |
as that would lead to numerical roundoff errors.) | as that would lead to numerical roundoff errors.) | ||
Line 51: | Line 51: | ||
Now, consider the case where <math>\textstyle \theta \in \Re^n</math> is a vector rather than a single real | Now, consider the case where <math>\textstyle \theta \in \Re^n</math> is a vector rather than a single real | ||
number (so that we have <math>\textstyle n</math> parameters that we want to learn), and <math>\textstyle J: \Re^n \mapsto \Re</math>. In | number (so that we have <math>\textstyle n</math> parameters that we want to learn), and <math>\textstyle J: \Re^n \mapsto \Re</math>. In | ||
- | our neural network example we used | + | our neural network example we used "<math>\textstyle J(W,b)</math>," but one can imagine "unrolling" |
the parameters <math>\textstyle W,b</math> into a long vector <math>\textstyle \theta</math>. We now generalize our derivative | the parameters <math>\textstyle W,b</math> into a long vector <math>\textstyle \theta</math>. We now generalize our derivative | ||
checking procedure to the case where <math>\textstyle \theta</math> may be a vector. | checking procedure to the case where <math>\textstyle \theta</math> may be a vector. | ||
Line 65: | Line 65: | ||
\end{align}</math> | \end{align}</math> | ||
is the <math>\textstyle i</math>-th basis vector (a | is the <math>\textstyle i</math>-th basis vector (a | ||
- | vector of the same dimension as <math>\textstyle \theta</math>, with a | + | vector of the same dimension as <math>\textstyle \theta</math>, with a "1" in the <math>\textstyle i</math>-th position |
- | and | + | and "0"s everywhere else). So, |
<math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented | <math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented | ||
by {\rm EPSILON}. Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the | by {\rm EPSILON}. Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the |