反向传导算法
From Ufldl
Line 20: | Line 20: | ||
</math> | </math> | ||
- | + | 以上关于<math>\textstyle J(W,b)</math>定义中的第一项是一个均方差项。第二项是一个规则化项(也叫'''权重衰减项'''),其目的是减小权重的幅度,防止过度拟合。 | |
Line 69: | Line 69: | ||
<li>对于第 <math>\textstyle n_l</math> 层(输出层)的每个输出单元 <math>\textstyle i</math>,我们根据以下公式计算残差: | <li>对于第 <math>\textstyle n_l</math> 层(输出层)的每个输出单元 <math>\textstyle i</math>,我们根据以下公式计算残差: | ||
+ | |||
+ | :<math> | ||
+ | \begin{align} | ||
+ | \delta^{(n_l)}_i | ||
+ | = \frac{\partial}{\partial z^{(n_l)}_i} \;\; | ||
+ | \frac{1}{2} \left\|y - h_{W,b}(x)\right\|^2 = - (y_i - a^{(n_l)}_i) \cdot f'(z^{(n_l)}_i) | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | [译者注: | ||
:<math> | :<math> | ||
- | \delta^{(n_l)}_i = \frac{\partial}{\partial z^{n_l}_i}J(W,b;x,y) | + | \begin{align} |
- | = \frac{\partial}{\partial z^{n_l}_i}\frac{1}{2} \left\|y - h_{W,b}(x)\right\|^2 | + | \delta^{(n_l)}_i &= \frac{\partial}{\partial z^{n_l}_i}J(W,b;x,y) |
+ | = \frac{\partial}{\partial z^{n_l}_i}\frac{1}{2} \left\|y - h_{W,b}(x)\right\|^2 \\ | ||
+ | &= \frac{\partial}{\partial z^{n_l}_i}\frac{1}{2} \sum_{j=1}^{S_{n_l}} (y_j-a_j^{(n_l)})^2 | ||
+ | = \frac{\partial}{\partial z^{n_l}_i}\frac{1}{2} \sum_{j=1}^{S_{n_l}} (y_j-f(z_j^{(n_l)}))^2 \\ | ||
+ | &= - (y_i - f(z_i^{(n_l)})) \cdot f'(z^{(n_l)}_i) | ||
= - (y_i - a^{(n_l)}_i) \cdot f'(z^{(n_l)}_i) | = - (y_i - a^{(n_l)}_i) \cdot f'(z^{(n_l)}_i) | ||
+ | \end{align} | ||
</math> | </math> | ||
- | + | ] | |
<li>对 <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> 的各个层,第 <math>\textstyle l</math> 层的第 <math>\textstyle i</math> 个节点的残差计算方法如下: | <li>对 <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> 的各个层,第 <math>\textstyle l</math> 层的第 <math>\textstyle i</math> 个节点的残差计算方法如下: | ||
Line 81: | Line 96: | ||
</math> | </math> | ||
- | + | {译者注: | |
- | + | ||
- | + | ||
:<math> | :<math> | ||
- | \delta^{( | + | \begin{align} |
+ | \delta^{(n_l-1)}_i &=\frac{\partial}{\partial z^{n_l-1}_i}J(W,b;x,y) | ||
+ | = \frac{\partial}{\partial z^{n_l-1}_i}\frac{1}{2} \left\|y - h_{W,b}(x)\right\|^2 | ||
+ | = \frac{\partial}{\partial z^{n_l-1}_i}\frac{1}{2} \sum_{j=1}^{S_{n_l}}(y_j-a_j^{(n_l)})^2 \\ | ||
+ | &= \frac{1}{2} \sum_{j=1}^{S_{n_l}}\frac{\partial}{\partial z^{n_l-1}_i}(y_j-a_j^{(n_l)})^2 | ||
+ | = \frac{1}{2} \sum_{j=1}^{S_{n_l}}\frac{\partial}{\partial z^{n_l-1}_i}(y_j-f(z_j^{(n_l)}))^2 \\ | ||
+ | &= \sum_{j=1}^{S_{n_l}}-(y_j-f(z_j^{(n_l)})) \cdot \frac{\partial}{\partial z_i^{(n_l-1)}}f(z_j^{(n_l)}) | ||
+ | = \sum_{j=1}^{S_{n_l}}-(y_j-f(z_j^{(n_l)})) \cdot f'(z_j^{(n_l)}) \cdot \frac{\partial z_j^{(n_l)}}{\partial z_i^{(n_l-1)}} \\ | ||
+ | &= \sum_{j=1}^{S_{n_l}} \delta_j^{(n_l)} \cdot \frac{\partial z_j^{(n_l)}}{\partial z_i^{n_l-1}} | ||
+ | = \sum_{j=1}^{S_{n_l}} \left(\delta_j^{(n_l)} \cdot \frac{\partial}{\partial z_i^{n_l-1}}\sum_{k=1}^{S_{n_l-1}}f(z_k^{n_l-1}) \cdot W_{jk}^{n_l-1}\right) \\ | ||
+ | &= \sum_{j=1}^{S_{n_l}} \delta_j^{(n_l)} \cdot W_{ji}^{n_l-1} \cdot f'(z_i^{n_l-1}) | ||
+ | = \left(\sum_{j=1}^{S_{n_l}}W_{ji}^{n_l-1}\delta_j^{(n_l)}\right)f'(z_i^{n_l-1}) | ||
+ | \end{align} | ||
</math> | </math> | ||
- | + | 将上式中的<math>\textstyle n_l-1</math>与<math>\textstyle n_l</math>的关系替换为<math>\textstyle l</math>与<math>\textstyle l+1</math>的关系,就可以得到: | |
- | + | : <math> | |
- | = \left( \sum_{j=1}^{s_{l | + | \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) f'(z^{(l)}_i) |
</math> | </math> | ||
+ | 以上逐次从后向前求导的过程即为“反向传导”的本意所在。 | ||
- | + | ] | |
- | |||
- | |||
- | |||
- | |||
Line 198: | Line 220: | ||
王方(fangkey@gmail.com),林锋(xlfg@yeah.net),许利杰(csxulijie@gmail.com) | 王方(fangkey@gmail.com),林锋(xlfg@yeah.net),许利杰(csxulijie@gmail.com) | ||
+ | |||
+ | |||
+ | {{稀疏自编码器}} | ||
{{Languages|Backpropagation_Algorithm|English}} | {{Languages|Backpropagation_Algorithm|English}} |