稀疏编码
From Ufldl
(→一审:@大黄蜂的思索) |
|||
Line 6: | Line 6: | ||
一审:@大黄蜂的思索 | 一审:@大黄蜂的思索 | ||
- | |||
Sparse coding is a class of unsupervised methods for learning sets of over-complete bases to represent data efficiently. The aim of sparse coding is to find a set of basis vectors <math>\mathbf{\phi}_i</math> such that we can represent an input vector <math>\mathbf{x}</math> as a linear combination of these basis vectors: | Sparse coding is a class of unsupervised methods for learning sets of over-complete bases to represent data efficiently. The aim of sparse coding is to find a set of basis vectors <math>\mathbf{\phi}_i</math> such that we can represent an input vector <math>\mathbf{x}</math> as a linear combination of these basis vectors: | ||
Line 38: | Line 37: | ||
\text{minimize}_{a^{(j)}_i,\mathbf{\phi}_{i}} \sum_{j=1}^{m} \left|\left| \mathbf{x}^{(j)} - \sum_{i=1}^k a^{(j)}_i \mathbf{\phi}_{i}\right|\right|^{2} + \lambda \sum_{i=1}^{k}S(a^{(j)}_i) | \text{minimize}_{a^{(j)}_i,\mathbf{\phi}_{i}} \sum_{j=1}^{m} \left|\left| \mathbf{x}^{(j)} - \sum_{i=1}^k a^{(j)}_i \mathbf{\phi}_{i}\right|\right|^{2} + \lambda \sum_{i=1}^{k}S(a^{(j)}_i) | ||
\end{align}</math> | \end{align}</math> | ||
+ | |||
+ | where <math>S(.)</math> is a sparsity cost function which penalizes <math>a_i</math> for being far from zero. We can interpret the first term of the sparse coding objective as a reconstruction term which tries to force the algorithm to provide a good representation of <math>\mathbf{x}</math> and the second term as a sparsity penalty which forces our representation of <math>\mathbf{x}</math> to be sparse. The constant <math>\lambda</math> is a scaling constant to determine the relative importance of these two contributions. | ||
+ | |||
+ | 【初译】此处, <math>S(.)</math> 是一个稀疏代价函数,它的惩罚系数 <math>a_i</math> 不为零。第一项解释为稀疏编码的目标项,它作为重建项,约束算法给 <math>\mathbf{x}</math> 提供一个好的表示;第二项作为稀疏惩罚性,它约束对 <math>\mathbf{x}</math> 的表示是稀疏的;常量 <math>\lambda</math> 是一个尺度常量,决定两项贡献的相对重要性。 | ||
+ | |||
+ | 【一审】此处 <math>S(.)</math> 是一个稀疏代价函数,由它来对远大于零的 <math>a_i</math> 进行“惩罚”。我们可以把稀疏编码目标函式的第一项解释为稀疏编码的重构项,这项公式迫使稀疏编码算法能为输入向量 <math>\mathbf{x}</math> 提供一个高拟合度的线性表达式,而公式第二项即“稀疏惩罚”项,它使 <math>\mathbf{x}</math> 的表达式变得“稀疏”。常量 <math>\lambda</math> 是一个变换量,由它来控制这两项式子的相对重要性。 | ||
+ | |||
+ | Although the most direct measure of sparsity is the "<math>L_0</math>" norm (<math>S(a_i) = \mathbf{1}(|a_i|>0)</math>), it is non-differentiable and difficult to optimize in general. In practice, common choices for the sparsity cost <math>S(.)</math> are the <math>L_1</math> penalty <math>S(a_i)=\left|a_i\right|_1 </math> and the log penalty <math>S(a_i)=\log(1+a_i^2)</math>. | ||
+ | |||
+ | 【初译】尽管最直接的稀疏性测度是 "<math>L_0</math>" 范数(<math>S(a_i) = \mathbf{1}(|a_i|>0)</math>) ,该范数通常不可微且难以优化。在实际中,稀疏代价函数 <math>S(.)</math> 的选择是<math>L_1</math> 征罚 <math>S(a_i)=\left|a_i\right|_1 </math> , <math>S(a_i)=\log(1+a_i^2)</math>. | ||
+ | |||
+ | 【一审】虽然“稀疏性”的最直接测度标准是 "<math>L_0</math>" 范式(<math>S(a_i) = \mathbf{1}(|a_i|>0)</math>), 但这是不可微的,而且通常很难进行优化,而实际中,稀疏代价函数 <math>S(.)</math> 的普遍选择是<math>L_1</math> 范式代价函数 <math>S(a_i)=\left|a_i\right|_1 </math> 及对数代价函数 <math>S(a_i)=\log(1+a_i^2)</math>。 | ||
+ | |||
+ | In addition, it is also possible to make the sparsity penalty arbitrarily small by scaling down <math>a_i</math> and scaling <math>\mathbf{\phi}_i</math> up by some large constant. To prevent this from happening, we will constrain <math>\left|\left|\mathbf{\phi}\right|\right|^2</math> to be less than some constant <math>C</math>. The full sparse coding cost function including our constraint on <math>\mathbf{\phi}</math> is | ||
+ | |||
+ | 【初译】此外,常可通过缩小 <math>a_i</math> ,放大<math>\mathbf{\phi}_i</math> ,通过使用一些大的常量,任意地选择稀疏惩罚项。为了阻止这一可能的发生,增加约束,使 <math>\left|\left|\mathbf{\phi}\right|\right|^2</math> 小于一些常量 <math>C</math>。包括 <math>\mathbf{\phi}</math> 约束完整的稀疏编码代价函数定义如下: | ||
+ | |||
+ | 【一审】此外,稀疏代价函数也可以通过减小 <math>a_i</math> 或增加<math>\mathbf{\phi}_i</math> 至很大的常量来使其自身变得极其地小。为防止此类事件发生,我们将 <math>\left|\left|\mathbf{\phi}\right|\right|^2</math> 限制为比某常量 <math>C</math>更小的数。包含了限制条件的稀疏编码代价函数的完整形式如下: | ||
+ | |||
+ | :<math>\begin{array}{rc} | ||
+ | \text{minimize}_{a^{(j)}_i,\mathbf{\phi}_{i}} & \sum_{j=1}^{m} \left|\left| \mathbf{x}^{(j)} - \sum_{i=1}^k a^{(j)}_i \mathbf{\phi}_{i}\right|\right|^{2} + \lambda \sum_{i=1}^{k}S(a^{(j)}_i) | ||
+ | \\ | ||
+ | \text{subject to} & \left|\left|\mathbf{\phi}_i\right|\right|^2 \leq C, \forall i = 1,...,k | ||
+ | \\ | ||
+ | \end{array}</math> | ||
+ | |||
+ | == Probabilistic Interpretation [Based on Olshausen and Field 1996] == | ||
+ | So far, we have considered sparse coding in the context of finding a sparse, over-complete set of basis vectors to span our input space. Alternatively, we may also approach sparse coding from a probabilistic perspective as a generative model. | ||
+ | |||
+ | 【初译】概率解释 | ||
+ | 到目前为止,在寻找一个稀疏的上下文中,考虑了稀疏编码,输入空间上的完备基向量。相应地,我们也可以从概率角度来处理稀疏编码,将它看一个生成模型。把自然图像的建模问题看作是 k个维独立的原特征 和附加噪声 的一个线性叠加 | ||
+ | |||
+ | 【一审】概率解释[该理论基于1996年Olshausen 与 Field的模型] | ||
+ | 到目前为止,我们已通过确定稀疏系数及超完备基向量的方法论述了稀疏编码算法。不过,我们或许可从概率的角度为稀疏编码算法找到一种“生成模型”。 | ||
+ | |||
+ | Consider the problem of modelling natural images as the linear superposition of <math>k</math> independent source features <math>\mathbf{\phi}_i</math> with some additive noise <math>\nu</math>: | ||
+ | |||
+ | 【初译】把自然图像的建模问题看作是 <math>k</math> 个维独立的原特征 <math>\mathbf{\phi}_i</math> 和附加噪声 <math>\nu</math>的一个线性叠加 | ||
+ | |||
+ | 【一审】举个自然图像建模的例子,这种建模方式通过 <math>k</math> 个独立的特征向量 <math>\mathbf{\phi}_i</math> 进行线性叠加,这里的 具有一些噪音 <math>\nu</math>,如下式: |