Gradient checking and advanced optimization

From Ufldl

Jump to: navigation, search
Line 32: Line 32:
\frac{J(\theta+{\rm EPSILON}) - J(\theta-{\rm EPSILON})}{2 \times {\rm EPSILON}}
\frac{J(\theta+{\rm EPSILON}) - J(\theta-{\rm EPSILON})}{2 \times {\rm EPSILON}}
\end{align}</math>
\end{align}</math>
-
In practice, we set {\rm EPSILON} to a small constant, say around <math>\textstyle 10^{-4}</math>.
+
In practice, we set <math>{\rm EPSILON}</math> to a small constant, say around <math>\textstyle 10^{-4}</math>.
-
(There's a large range of values of {\rm EPSILON} that should work well, but
+
(There's a large range of values of <math>{\rm EPSILON}</math> that should work well, but
-
we don't set {\rm EPSILON} to be "extremely" small, say <math>\textstyle 10^{-20}</math>,
+
we don't set <math>{\rm EPSILON}</math> to be "extremely" small, say <math>\textstyle 10^{-20}</math>,
as that would lead to numerical roundoff errors.)
as that would lead to numerical roundoff errors.)
Line 68: Line 68:
and "0"s everywhere else).  So,
and "0"s everywhere else).  So,
<math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented
<math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented
-
by {\rm EPSILON}.  Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the
+
by <math>{\rm EPSILON}</math>.  Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the
-
corresponding vector with the <math>\textstyle i</math>-th element decreased by {\rm EPSILON}.
+
corresponding vector with the <math>\textstyle i</math>-th element decreased by <math>{\rm EPSILON}</math>.
We can now numerically verify <math>\textstyle g_i(\theta)</math>'s correctness by checking, for each <math>\textstyle i</math>,
We can now numerically verify <math>\textstyle g_i(\theta)</math>'s correctness by checking, for each <math>\textstyle i</math>,
that:
that:
Line 84: Line 84:
\nabla_{b^{(l)}} J(W,b) &= \frac{1}{m} \Delta b^{(l)}.
\nabla_{b^{(l)}} J(W,b) &= \frac{1}{m} \Delta b^{(l)}.
\end{align}</math>
\end{align}</math>
-
This result shows that the final block of psuedo-code in Section~\ref{sec-backprop} is indeed
+
This result shows that the final block of psuedo-code in [[Backpropagation Algorithm]] is indeed
implementing gradient descent.
implementing gradient descent.
To make sure your implementation of gradient descent is correct, it is
To make sure your implementation of gradient descent is correct, it is

Revision as of 23:20, 22 April 2011

Personal tools