微调多层自编码算法

From Ufldl

Jump to: navigation, search
Line 20: Line 20:
【原文】:
【原文】:
Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.
Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.
 +
【初译】:一般策略
【初译】:一般策略
幸运的是,实现微调多层自动编码机的所有工具齐备。为了在每次迭代中对所有的层计算梯度,需要使用稀疏自编码一节讨论的反向传播算法。因为反向传播算法可以延伸应用到任意多层,所以事实上它对任意多层的自动编码机都适用。
幸运的是,实现微调多层自动编码机的所有工具齐备。为了在每次迭代中对所有的层计算梯度,需要使用稀疏自编码一节讨论的反向传播算法。因为反向传播算法可以延伸应用到任意多层,所以事实上它对任意多层的自动编码机都适用。
 +
【一审】:
【一审】:
Line 27: Line 29:
【原文】:
【原文】:
For your convenience, the summary of the backpropagation algorithm using element wise notation is below:
For your convenience, the summary of the backpropagation algorithm using element wise notation is below:
 +
【初译】:使用反向传播微调
【初译】:使用反向传播微调
为方便使用,以下简要描述了反向传播算法的使用:
为方便使用,以下简要描述了反向传播算法的使用:
 +
【一审】:
【一审】:
Line 35: Line 39:
【原文】:
【原文】:
: 1. Perform a feedforward pass, computing the activations for layers <math>\textstyle L_2</math>, <math>\textstyle L_3</math>, up to the output layer <math>\textstyle L_{n_l}</math>, using the equations defining the forward propagation steps.
: 1. Perform a feedforward pass, computing the activations for layers <math>\textstyle L_2</math>, <math>\textstyle L_3</math>, up to the output layer <math>\textstyle L_{n_l}</math>, using the equations defining the forward propagation steps.
 +
【初译】:
【初译】:
: 1. 进行一次前馈传递,对 <math>\textstyle L_2</math>、 <math>\textstyle L_3</math> 直到输出层 <math>\textstyle L_{n_l}</math>,使用正向传播步骤中定义的公式计算各层上的激励。
: 1. 进行一次前馈传递,对 <math>\textstyle L_2</math>、 <math>\textstyle L_3</math> 直到输出层 <math>\textstyle L_{n_l}</math>,使用正向传播步骤中定义的公式计算各层上的激励。
 +
【一审】:
【一审】:
Line 47: Line 53:
\end{align}</math>
\end{align}</math>
::(When using softmax regression, the softmax layer has <math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the vector of conditional probabilities.)
::(When using softmax regression, the softmax layer has <math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the vector of conditional probabilities.)
 +
【初译】:
【初译】:
: 2. 对输出层(<math>\textstyle n_l</math> 层),令
: 2. 对输出层(<math>\textstyle n_l</math> 层),令
Line 54: Line 61:
\end{align}</math>
\end{align}</math>
::(当使用softmax回归时,softmax层满足:<math>\nabla J = \theta^T(I-P)</math>,其中 <math>I</math> 为输入标签,<math>P</math> 为条件概率向量。)
::(当使用softmax回归时,softmax层满足:<math>\nabla J = \theta^T(I-P)</math>,其中 <math>I</math> 为输入标签,<math>P</math> 为条件概率向量。)
 +
【一审】:
【一审】:
Line 63: Line 71:
                 \delta^{(l)} = \left((W^{(l)})^T \delta^{(l+1)}\right) \bullet f'(z^{(l)})
                 \delta^{(l)} = \left((W^{(l)})^T \delta^{(l+1)}\right) \bullet f'(z^{(l)})
                 \end{align}</math>
                 \end{align}</math>
 +
【初译】:
【初译】:
: 3. 对 <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>  
: 3. 对 <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>  
Line 69: Line 78:
                 \delta^{(l)} = \left((W^{(l)})^T \delta^{(l+1)}\right) \bullet f'(z^{(l)})
                 \delta^{(l)} = \left((W^{(l)})^T \delta^{(l+1)}\right) \bullet f'(z^{(l)})
                 \end{align}</math>
                 \end{align}</math>
 +
【一审】:
【一审】:
Line 83: Line 93:
&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]
&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]
\end{align}</math>
\end{align}</math>
 +
【初译】:
【初译】:
: 4. 计算所需的偏导数:
: 4. 计算所需的偏导数:
Line 94: Line 105:
&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]
&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]
\end{align}</math>
\end{align}</math>
 +
【一审】:
【一审】:
Line 101: Line 113:
Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>, where  <math>\nabla J = \theta^T(I-P)</math>.
Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>, where  <math>\nabla J = \theta^T(I-P)</math>.
}}
}}
 +
【初译】:
【初译】:
{{Quote|
{{Quote|
注:softmax分类器可以被认为是附加的一层,但是以上推导中并非如此。具体地说,网络“最后一层”地特征会进入softmax分类器。所以,第二步中的导数由 <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math> 计算,其中 <math>\nabla J = \theta^T(I-P)</math>。
注:softmax分类器可以被认为是附加的一层,但是以上推导中并非如此。具体地说,网络“最后一层”地特征会进入softmax分类器。所以,第二步中的导数由 <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math> 计算,其中 <math>\nabla J = \theta^T(I-P)</math>。
}}
}}
 +
【一审】:
【一审】:

Revision as of 18:33, 9 March 2013

Personal tools