反向传导算法
From Ufldl
Line 222: | Line 222: | ||
"<math>\textstyle \Delta W^{(l)}</math>" is a matrix, and in particular it isn't "<math>\textstyle \Delta</math> times <math>\textstyle W^{(l)}</math>." We implement one iteration of batch gradient descent as follows: | "<math>\textstyle \Delta W^{(l)}</math>" is a matrix, and in particular it isn't "<math>\textstyle \Delta</math> times <math>\textstyle W^{(l)}</math>." We implement one iteration of batch gradient descent as follows: | ||
:【初译】: | :【初译】: | ||
+ | 最终,我们完成了梯度下降算法过程的全部描述。在下面的伪代码中,<math>\textstyle \Delta W^{(l)}</math>是一个与矩阵<math>\textstyle W^{(l)}</math>维度相同的矩阵,<math>\textstyle \Delta b^{(l)}</math>是一个与<math>\textstyle b^{(l)}</math>维度相同的向量。注意这里“<math>\textstyle \Delta W^{(l)}</math>”是一个矩阵,而不是“<math>\textstyle \Delta</math>与<math>\textstyle W^{(l)}</math>相乘”。下面,我们实现批量梯度下降法的一部迭代: | ||
:【一校】: | :【一校】: | ||
Line 242: | Line 243: | ||
</ol> | </ol> | ||
:【初译】: | :【初译】: | ||
+ | <ol> | ||
+ | <li>对于所有<math>\textstyle l</math>,设置<math>\textstyle \Delta W^{(l)} := 0</math>, <math>\textstyle \Delta b^{(l)} := 0</math>,(设置为全零矩阵或全零向量) | ||
+ | <li>从<math>\textstyle i = 1</math>到<math>\textstyle m</math>, | ||
+ | <ol type="a"> | ||
+ | <li>使用反向传播计算<math>\textstyle \nabla_{W^{(l)}} J(W,b;x,y)</math>和<math>\textstyle \nabla_{b^{(l)}} J(W,b;x,y)</math>。 | ||
+ | <li>计算<math>\textstyle \Delta W^{(l)} := \Delta W^{(l)} + \nabla_{W^{(l)}} J(W,b;x,y)</math>。 | ||
+ | <li>计算<math>\textstyle \Delta b^{(l)} := \Delta b^{(l)} + \nabla_{b^{(l)}} J(W,b;x,y)</math>。 | ||
+ | </ol> | ||
+ | |||
+ | <li>更新权重参数: | ||
+ | :<math>\begin{align} | ||
+ | W^{(l)} &= W^{(l)} - \alpha \left[ \left(\frac{1}{m} \Delta W^{(l)} \right) + \lambda W^{(l)}\right] \\ | ||
+ | b^{(l)} &= b^{(l)} - \alpha \left[\frac{1}{m} \Delta b^{(l)}\right] | ||
+ | \end{align}</math> | ||
:【一校】: | :【一校】: | ||
Line 247: | Line 262: | ||
To train our neural network, we can now repeatedly take steps of gradient descent to reduce our cost function <math>\textstyle J(W,b)</math>. | To train our neural network, we can now repeatedly take steps of gradient descent to reduce our cost function <math>\textstyle J(W,b)</math>. | ||
:【初译】: | :【初译】: | ||
+ | 我们现在就可以重复梯度下降法的迭代步骤以减小代价函数<math>\textstyle J(W,b)</math>的值,从而训练我们的神经网络。 | ||
:【一校】: | :【一校】: | ||
{{Sparse_Autoencoder}} | {{Sparse_Autoencoder}} |