反向传导算法

From Ufldl

Jump to: navigation, search
Line 222: Line 222:
"<math>\textstyle \Delta W^{(l)}</math>" is a matrix, and in particular it isn't "<math>\textstyle \Delta</math> times <math>\textstyle W^{(l)}</math>." We implement one iteration of batch gradient descent as follows:
"<math>\textstyle \Delta W^{(l)}</math>" is a matrix, and in particular it isn't "<math>\textstyle \Delta</math> times <math>\textstyle W^{(l)}</math>." We implement one iteration of batch gradient descent as follows:
:【初译】:
:【初译】:
 +
最终,我们完成了梯度下降算法过程的全部描述。在下面的伪代码中,<math>\textstyle \Delta W^{(l)}</math>是一个与矩阵<math>\textstyle W^{(l)}</math>维度相同的矩阵,<math>\textstyle \Delta b^{(l)}</math>是一个与<math>\textstyle b^{(l)}</math>维度相同的向量。注意这里“<math>\textstyle \Delta W^{(l)}</math>”是一个矩阵,而不是“<math>\textstyle \Delta</math>与<math>\textstyle W^{(l)}</math>相乘”。下面,我们实现批量梯度下降法的一部迭代:
:【一校】:
:【一校】:
Line 242: Line 243:
</ol>
</ol>
:【初译】:
:【初译】:
 +
<ol>
 +
<li>对于所有<math>\textstyle l</math>,设置<math>\textstyle \Delta W^{(l)} := 0</math>, <math>\textstyle \Delta b^{(l)} := 0</math>,(设置为全零矩阵或全零向量)
 +
<li>从<math>\textstyle i = 1</math>到<math>\textstyle m</math>,
 +
<ol type="a">
 +
<li>使用反向传播计算<math>\textstyle \nabla_{W^{(l)}} J(W,b;x,y)</math>和<math>\textstyle \nabla_{b^{(l)}} J(W,b;x,y)</math>。
 +
<li>计算<math>\textstyle \Delta W^{(l)} := \Delta W^{(l)} + \nabla_{W^{(l)}} J(W,b;x,y)</math>。
 +
<li>计算<math>\textstyle \Delta b^{(l)} := \Delta b^{(l)} + \nabla_{b^{(l)}} J(W,b;x,y)</math>。
 +
</ol>
 +
 +
<li>更新权重参数:
 +
:<math>\begin{align}
 +
W^{(l)} &= W^{(l)} - \alpha \left[ \left(\frac{1}{m} \Delta W^{(l)} \right) + \lambda W^{(l)}\right] \\
 +
b^{(l)} &= b^{(l)} - \alpha \left[\frac{1}{m} \Delta b^{(l)}\right]
 +
\end{align}</math>
:【一校】:
:【一校】:
Line 247: Line 262:
To train our neural network, we can now repeatedly take steps of gradient descent to reduce our cost function <math>\textstyle J(W,b)</math>.
To train our neural network, we can now repeatedly take steps of gradient descent to reduce our cost function <math>\textstyle J(W,b)</math>.
:【初译】:
:【初译】:
 +
我们现在就可以重复梯度下降法的迭代步骤以减小代价函数<math>\textstyle J(W,b)</math>的值,从而训练我们的神经网络。
:【一校】:
:【一校】:
{{Sparse_Autoencoder}}
{{Sparse_Autoencoder}}

Revision as of 16:37, 7 March 2013

Personal tools