稀疏编码

From Ufldl

Jump to: navigation, search
Line 216: Line 216:
\mathbf{\phi}^{*},\mathbf{a}^{*}=\text{argmin}_{\mathbf{\phi},\mathbf{a}} \sum_{j=1}^{m} \left|\left| \mathbf{x}^{(j)} - \sum_{i=1}^k a^{(j)}_i \mathbf{\phi}_{i}\right|\right|^{2} + \lambda \sum_{i=1}^{k}S(a^{(j)}_i)  
\mathbf{\phi}^{*},\mathbf{a}^{*}=\text{argmin}_{\mathbf{\phi},\mathbf{a}} \sum_{j=1}^{m} \left|\left| \mathbf{x}^{(j)} - \sum_{i=1}^k a^{(j)}_i \mathbf{\phi}_{i}\right|\right|^{2} + \lambda \sum_{i=1}^{k}S(a^{(j)}_i)  
\end{align}</math>
\end{align}</math>
 +
 +
Using a probabilistic approach, it can also be seen that the choices of the <math>L_1</math> penalty <math>\left|a_i\right|_1 </math> and the log penalty <math>\log(1+a_i^2)</math> for <math>S(.)</math> correspond to the use of the Laplacian <math>P(a_i) \propto \exp\left(-\beta|a_i|\right)</math> and the Cauchy prior <math>P(a_i) \propto \frac{\beta}{1+a_i^2}</math> respectively.
 +
 +
 +
【初译】使用概率方法,可看作是对 <math>L_1</math> 惩罚项 <math>\left|a_i\right|_1 </math> 和对数惩罚项 <math>\log(1+a_i^2)</math> 的选择,对于 <math>S(.)</math> 对应着Laplacian <math>P(a_i) \propto \exp\left(-\beta|a_i|\right)</math> 和 Cauchy先验 <math>P(a_i) \propto \frac{\beta}{1+a_i^2}</math> 的使用。
 +
 +
【一审】使用概率方法,也可被看成是为基于 <math>L_1</math> 范式的 <math>S(.)</math> 选择使用拉普拉斯定理 <math>P(a_i) \propto \exp\left(-\beta|a_i|\right)</math> 的代价函数 <math>\left|a_i\right|_1 </math> ,还是选择使用柯西先验定理 <math>P(a_i) \propto \frac{\beta}{1+a_i^2}</math> 的对数代价函数 <math>\log(1+a_i^2)</math> 。
 +
 +
== Learning ==
 +
学习 一审:算法学习
 +
 +
Learning a set of basis vectors <math>\mathbf{\phi}</math> using sparse coding consists of performing two separate optimizations, the first being an optimization over coefficients <math>a_i</math> for each training example <math>\mathbf{x}</math> and the second an optimization over basis vectors <math>\mathbf{\phi}</math> across many training examples at once.
 +
 +
【初译】学习一组集基向量 <math>\mathbf{\phi}</math> 使用稀疏编码组成两种独立的优化, 第一类对于每个训练样本基于系数 <math>a_i</math> 的优化,第二类是对多个训练样本,基于向量 <math>\mathbf{\phi}</math> 的优化。
 +
 +
【一审】使用稀疏编码算法学习基向量集的方法由两项各自独立的优化方法来实现,其一是逐个套用样本 <math>\mathbf{x}</math> 对系数 <math>a_i</math> 进行优化,其二是一次性处理多个样本对基向量 <math>\mathbf{\phi}</math> 进行优化。
 +
 +
Assuming an <math>L_1</math> sparsity penalty, learning <math>a^{(j)}_i</math> reduces to solving a <math>L_1</math> regularized least squares problem which is convex in <math>a^{(j)}_i</math> for which several techniques have been developed (convex optimization software such as CVX can also be used to perform L1 regularized least squares). Assuming a differentiable <math>S(.)</math> such as the log penalty, gradient-based methods such as conjugate gradient methods can also be used.
 +
 +
【初译】假设 <math>L_1</math> 稀疏惩罚,学习 <math>a^{(j)}_i</math> 减化为求解 一个 <math>L_1</math> 的正规化最小二乘问题,即在 <math>a^{(j)}_i</math> 上的凸集问题,针对此已有若干技术 (凸优化软件,如CVX 也被用于解决L1正规化最小二乘)。假定一个可微 <math>S(.)</math> ,如对数惩罚,基于梯度的方法,如共轭梯度方法用来解决优化问题。
 +
 +
【一审】以 <math>L_1</math> 范式稀疏惩罚(代价函数)为例,对 <math>a^{(j)}_i</math> 的学习过程就简化为求解 <math>L_1</math> 范式的正则化最小二乘法问题,这个问题函式在域 <math>a^{(j)}_i</math> 内为凸,已经有很多技术方法可以求得这个凸集(诸如CVX之类的凸优化软件可以用来解决L1范式的正则化最小二乘法问题)。如果可微函数 <math>S(.)</math> 是一个类似对数惩罚函数,则可以采用基于梯度算法的方法如共轭梯度法。
 +
 +
Learning a set of basis vectors with a <math>L_2</math> norm constraint also reduces to a least squares problem with quadratic constraints which is convex in <math>\mathbf{\phi}</math>. Standard convex optimization software (e.g. CVX) or other iterative methods can be used to solve for <math>\mathbf{\phi}</math> although significantly more efficient methods such as solving the Lagrange dual have also been developed.
 +
 +
【初译】 <math>L_2</math> 范数约束下学习一组基向量,常简化为有二次约束的最小二乘问题,也是基向量 <math>\mathbf{\phi}</math>上的凸问题。虽然已有更有效的方法,如解决拉格朗日对偶方法,标准凸优化软件(例如 CVX)或其它迭代方法都可用来求解基向量 <math>\mathbf{\phi}</math>。
 +
 +
【一审】用 <math>L_2</math> 范式学习基向量同样可以简化为求解二次约束最小二乘问题,其问题函式在域 <math>\mathbf{\phi}</math>内也为凸。标准凸优化软件(如CVX)或其它迭代方法就可以用来求解 <math>\mathbf{\phi}</math>,虽然已经有了求解拉格朗日对偶函数这样的更具效率的方法。
 +
 +
As described above, a significant limitation of sparse coding is that even after a set of basis vectors have been learnt, in order to "encode" a new data example, optimization must be performed to obtain the required coefficients. This significant "runtime" cost means that sparse coding is computationally expensive to implement even at test time especially compared to typical feedforward architectures.
 +
 +
【初译】如前所述,稀疏编码有效的限制是:即使在学习一组基向量后,“编码”一个新的数据样本,要获得需要的系数,必须执行优化。这个有效的“运行”代价指的是稀疏编码计算成本高昂,特别当与典型的前馈结构构相比,即使是测试计算代价仍很高。
 +
 +
【一审】综上所述,即使已经学习得到一组基向量,稀疏编码算法还是极有局限性的。为了对一组新样本进行“编码”,必须执行优化才可以得到所需要的系数。即使只是在测试的时候,特别是相比于“前馈”模式,这种巨大的“运行时”成本意味着稀疏编码是极耗运算性能的。

Revision as of 04:21, 8 March 2013

Personal tools