稀疏编码
From Ufldl
Line 63: | Line 63: | ||
\end{array}</math> | \end{array}</math> | ||
- | == Probabilistic Interpretation [Based on Olshausen and Field 1996] == | + | == Probabilistic Interpretation [Based on Olshausen and Field 1996] ==概率解释[该理论基于1996年Olshausen 与 Field的模型] |
+ | |||
So far, we have considered sparse coding in the context of finding a sparse, over-complete set of basis vectors to span our input space. Alternatively, we may also approach sparse coding from a probabilistic perspective as a generative model. | So far, we have considered sparse coding in the context of finding a sparse, over-complete set of basis vectors to span our input space. Alternatively, we may also approach sparse coding from a probabilistic perspective as a generative model. | ||
+ | Consider the problem of modelling natural images as the linear superposition of <math>k</math> independent source features <math>\mathbf{\phi}_i</math> with some additive noise <math>\nu</math>: | ||
- | |||
- | |||
- | + | 【初译】到目前为止,在寻找一个稀疏的上下文中,考虑了稀疏编码,输入空间上的完备基向量。相应地,我们也可以从概率角度来处理稀疏编码,将它看一个生成模型。 | |
- | + | 把自然图像的建模问题看作是 <math>k</math> 个维独立的原特征 <math>\mathbf{\phi}_i</math> 和附加噪声 <math>\nu</math>的一个线性叠加 | |
- | |||
- | + | 【一审】到目前为止,我们已通过确定稀疏系数及超完备基向量的方法论述了稀疏编码算法。不过,我们或许可从概率的角度为稀疏编码算法找到一种“生成模型”。 | |
+ | 举个自然图像建模的例子,这种建模方式通过 <math>k</math> 个独立的特征向量 <math>\mathbf{\phi}_i</math> 进行线性叠加,这里的 具有一些噪音 <math>\nu</math>,如下式: | ||
+ | |||
+ | :<math>\begin{align} | ||
+ | \mathbf{x} = \sum_{i=1}^k a_i \mathbf{\phi}_{i} + \nu(\mathbf{x}) | ||
+ | \end{align}</math> | ||
+ | |||
+ | Our goal is to find a set of basis feature vectors <math>\mathbf{\phi}</math> such that the distribution of images <math>P(\mathbf{x}\mid\mathbf{\phi})</math> is as close as possible to the empirical distribution of our input data <math>P^*(\mathbf{x})</math>. One method of doing so is to minimize the KL divergence between <math>P^*(\mathbf{x})</math> and <math>P(\mathbf{x}\mid\mathbf{\phi})</math> where the KL divergence is defined as: | ||
+ | |||
+ | 【初译】我们的目标是寻找一组基特征向量 <math>\mathbf{\phi}</math> ,以致于图像的分布 <math>P(\mathbf{x}\mid\mathbf{\phi})</math> 尽可能的近似输入数据 <math>P^*(\mathbf{x})</math>的经验分布。一类方法是最小化KL <math>P^*(\mathbf{x})</math> 和 <math>P^*(\mathbf{x})</math> 之间的散度,这里 KL 散度定义如下: | ||
+ | |||
+ | 【一审】我们的目标是找到一组特征向量 <math>\mathbf{\phi}</math> ,因此,图像的分布函数 <math>P(\mathbf{x}\mid\mathbf{\phi})</math> 就可以尽可能地近似于输入数据的经验分布函数 <math>P^*(\mathbf{x})</math>。这么做的一种方法是,最小化 <math>P^*(\mathbf{x})</math> 与 <math>P^*(\mathbf{x})</math> 之间的KL离差,KL离差表示如下: | ||
+ | |||
+ | :<math>\begin{align} | ||
+ | D(P^*(\mathbf{x})||P(\mathbf{x}\mid\mathbf{\phi})) = \int P^*(\mathbf{x}) \log \left(\frac{P^*(\mathbf{x})}{P(\mathbf{x}\mid\mathbf{\phi})}\right)d\mathbf{x} | ||
+ | \end{align}</math> | ||
+ | |||
+ | Since the empirical distribution <math>P^*(\mathbf{x})</math> is constant across our choice of <math>\mathbf{\phi}</math>, this is equivalent to maximizing the log-likelihood of <math>P(\mathbf{x}\mid\mathbf{\phi})</math>. | ||
+ | Assuming <math>\nu</math> is Gaussian white noise with variance <math>\sigma^2</math>, we have that | ||
+ | |||
+ | 【初译】因为通过我们对 <math>\mathbf{\phi}</math>的选择,经验分布 <math>P^*(\mathbf{x})</math> 是不变量,这相当于最大化对数似然 <math>P(\mathbf{x}\mid\mathbf{\phi})</math>。 | ||
+ | 假设 <math>\nu</math> 是方差为 <math>\sigma^2</math>高斯白噪声,有下式成立。 | ||
- | + | 【一审】因为经验分布函数 <math>P^*(\mathbf{x})</math> 对于所有的 <math>\mathbf{\phi}</math>其结果是常量,这就等于说要最大化对数似然函数 <math>P(\mathbf{x}\mid\mathbf{\phi})</math>。 | |
+ | 假设 <math>\nu</math> 是具有方差 <math>\sigma^2</math>的高斯白噪音,则有下式: |