栈式自编码算法

From Ufldl

Jump to: navigation, search
(Training)
Line 86: Line 86:
===Training===
===Training===
A good way to obtain good parameters for a stacked autoencoder is to use greedy layer-wise training. To do this, first train the first layer on raw input to obtain parameters <math>W^{(1,1)}, W^{(1,2)}, b^{(1,1)}, b^{(1,2)}</math>. Use the first layer to transform the raw input into a vector consisting of activation of the hidden units, A. Train the second layer on this vector to obtain parameters <math>W^{(2,1)}, W^{(2,2)}, b^{(2,1)}, b^{(2,2)}</math>. Repeat for subsequent layers, using the output of each layer as input for the subsequent layer.
A good way to obtain good parameters for a stacked autoencoder is to use greedy layer-wise training. To do this, first train the first layer on raw input to obtain parameters <math>W^{(1,1)}, W^{(1,2)}, b^{(1,1)}, b^{(1,2)}</math>. Use the first layer to transform the raw input into a vector consisting of activation of the hidden units, A. Train the second layer on this vector to obtain parameters <math>W^{(2,1)}, W^{(2,2)}, b^{(2,1)}, b^{(2,2)}</math>. Repeat for subsequent layers, using the output of each layer as input for the subsequent layer.
 +
 +
【初译】
 +
想从上述栈式网络中得到好的参数,我们可以使用贪心分层方法。即先通过输入来训练第一层网络,得到<math>W^{(1,1)}, W^{(1,2)}, b^{(1,1)}, b^{(1,2)}</math>。再用第一层网络将输入转化,成为由隐藏单元的激发阈值组成的向量,暂且称做A。把A作为第二层的输入,继续进行训练,得到<math>W^{(2,1)}, W^{(2,2)}, b^{(2,1)}, b^{(2,2)}</math>。如此这般,都是前一层的输出作为后一层的输入。
 +
 +
【一审】
 +
一种比较好的获取栈式自编码神经网络参数的方法是采用贪心分层方式进行训练。即先通过原始输入来训练第一层网络,得到其参数<math>W^{(1,1)}, W^{(1,2)}, b^{(1,1)}, b^{(1,2)}</math>;再用第一层网络将原始输入转化成为由隐藏单元响应组成的向量,假设该向量为A,接着把A作为第二层的输入,继续训练得到第二层的参数<math>W^{(2,1)}, W^{(2,2)}, b^{(2,1)}, b^{(2,2)}</math>;对后面的各层同样采用将前层的输出作为下一层输入的方式依次训练。
 +
This method trains the parameters of each layer individually while freezing parameters for the remainder of the model. To produce better results, after this phase of training is complete, [[Fine-tuning Stacked AEs | fine-tuning]] using backpropagation can be used to improve the results by tuning the parameters of all layers are changed at the same time.  
This method trains the parameters of each layer individually while freezing parameters for the remainder of the model. To produce better results, after this phase of training is complete, [[Fine-tuning Stacked AEs | fine-tuning]] using backpropagation can be used to improve the results by tuning the parameters of all layers are changed at the same time.  
 +
 +
【初译】
 +
这种方法在训练的时候,除了当前层,不会改变其他层次的参数。因此,为了得到更好的结果,在训练阶段结束以后,用反向传播算法微调一下,让所有层的参数同时变化,会对结果有所提升。
 +
 +
【一审】
 +
上述训练方式,在训练每一层参数的时候,会固定其它各层参数保持不变。所以,如果想得到更好的结果,在上述预训练过程完成之后,可以通过反向传播算法同时调整所有层的参数以改善结果,这个过程一般被称作“微调(fine-tuning)”。
 +
<!-- In practice, fine-tuning should be use when the parameters have been brought close to convergence through layer-wise training. Attempting to use fine-tuning with the weights initialized randomly will lead to poor results due to local optima. -->
<!-- In practice, fine-tuning should be use when the parameters have been brought close to convergence through layer-wise training. Attempting to use fine-tuning with the weights initialized randomly will lead to poor results due to local optima. -->
Line 93: Line 107:
{{Quote|
{{Quote|
If one is only interested in finetuning for the purposes of classification, the common practice is to then discard the "decoding" layers of the stacked autoencoder and link the last hidden layer <math>a^{(n)}</math> to the softmax classifier. The gradients from the (softmax) classification error will then be backpropagated into the encoding layers.
If one is only interested in finetuning for the purposes of classification, the common practice is to then discard the "decoding" layers of the stacked autoencoder and link the last hidden layer <math>a^{(n)}</math> to the softmax classifier. The gradients from the (softmax) classification error will then be backpropagated into the encoding layers.
 +
 +
 +
【初译】
 +
如果你只对分类神经网络的微调感兴趣,惯用的做法是丢掉栈式网络的“解码”层,直接把最后那个隐含层<math>a^{(n)}</math> 跟softmax分类器连起来。这样分类器错误的梯度值就会直接反向传播给编码层了。
 +
 +
【一审】
 +
注:如果你只对以分类为目的的微调感兴趣,惯用的做法是丢掉栈式自编码器网络的“解码”层,直接把最后一个隐含层的<math>a^{(n)}</math> 作为特征输入到softmax分类器进行分类,这样,分类器(softmax)的分类错误的梯度值就可以直接反向传播给编码层了。
 +
}}
}}

Revision as of 12:25, 8 March 2013

Personal tools