Independent Component Analysis

From Ufldl

Jump to: navigation, search
(Introduction)
 
Line 1: Line 1:
== Introduction ==
== Introduction ==
-
If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do - to learn not any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).
+
If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).
-
Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:
+
Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are '''sparse'''; and secondly, our basis is an '''orthonormal''' basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:
:<math>
:<math>
Line 18: Line 18:
</math>
</math>
-
As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint).
+
As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint).  
 +
 
 +
In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Independent Component Analysis#Orthonormal ICA | Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) .
 +
 
 +
== Orthonormal ICA ==
 +
 
 +
The orthonormal ICA objective is:
 +
:<math>
 +
\begin{array}{rcl}
 +
    {\rm minimize} & \lVert Wx \rVert_1  \\
 +
    {\rm s.t.}    & WW^T = I \\
 +
\end{array}
 +
</math>
 +
 
 +
Observe that the constraint <math>WW^T = I</math> implies two other constraints.
 +
 
 +
Firstly, since we are learning an orthonormal basis, the number of basis vectors we learn must be less than the dimension of the input. In particular, this means that we cannot learn over-complete bases as we usually do in [[Sparse Coding: Autoencoder Interpretation | sparse coding]].
 +
 
 +
Secondly, the data must be [[Whitening | ZCA whitened]] with no regularization (that is, with <math>\epsilon</math> set to 0). ([[TODO]] Why must this be so?)
 +
 
 +
Hence, before we even begin to optimize for the orthonormal ICA objective, we must ensure that our data has been '''whitened''', and that we are learning an '''under-complete''' basis.
 +
 
 +
Following that, to optimize for the objective, we can use gradient descent, interspersing gradient descent steps with projection steps to enforce the orthonormality constraint. Hence, the procedure will be as follows:
 +
 
 +
Repeat until done:
 +
<ol>
 +
<li><math>W \leftarrow W - \alpha \nabla_W \lVert Wx \rVert_1</math>
 +
<li><math>W \leftarrow \operatorname{proj}_U W</math> where <math>U</math> is the space of matrices satisfying <math>WW^T = I</math>
 +
</ol>
 +
 
 +
In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-\frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).
 +
 
 +
== Topographic ICA ==
 +
 
 +
Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.
 +
 
 +
 
 +
{{Languages|独立成分分析|中文}}

Latest revision as of 04:35, 8 April 2013

Personal tools