http://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&feed=atom&action=historyIndependent Component Analysis - Revision history2024-03-28T16:59:58ZRevision history for this page on the wikiMediaWiki 1.16.2http://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=2323&oldid=prevKandeng at 04:35, 8 April 20132013-04-08T04:35:10Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:35, 8 April 2013</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 53:</td>
<td colspan="2" class="diff-lineno">Line 53:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{Languages|独立成分分析|中文}}</ins></div></td></tr>
</table>Kandenghttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=1015&oldid=prevJngiam: /* Introduction */2011-06-19T08:22:54Z<p><span class="autocomment">Introduction</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 08:22, 19 June 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 20:</td>
<td colspan="2" class="diff-lineno">Line 20:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint). </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint). </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Independent Component Analysis#Orthonormal ICA | Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) <del class="diffchange diffchange-inline">. In other situations, the orthonormality constraint is replaced by a reconstruction cost instead to give [[Independent Component Analysis#Reconstruction ICA | Reconstruction ICA]], which is very similar to [[Sparse Coding: Autoencoder Interpretation | sparse coding]], but no longer strictly finds independent components</del>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Independent Component Analysis#Orthonormal ICA | Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) .</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Orthonormal ICA ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Orthonormal ICA ==</div></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=1014&oldid=prevJngiam at 08:22, 19 June 20112011-06-19T08:22:16Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 08:22, 19 June 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 49:</td>
<td colspan="2" class="diff-lineno">Line 49:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-\frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-\frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">== Reconstruction ICA ==</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">In reconstruction ICA, we drop the constraint that we want an orthonormal basis, replacing it with a reconstruction error term. Hence, the new reconstruction ICA objective is:</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">:<math></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">\begin{array}{rcl}</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"> {\rm minimize} & \lVert Wx \rVert_1 + \lVert W^TWx \rVert_2^2 \\</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">\end{array} </del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></math></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">where <math>W^T(Wx)</math> is the reconstruction of the input from the features (i.e. <math>W^T</math> is supposed to play the role of <math>W^{-1}</math>). If you recall the [[Sparse Coding: Autoencoder Interpretation | sparse coding]] objective,</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">:<math></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">J(A, s) = \lVert As - x \rVert_2^2 + \lambda \lVert s \rVert_1</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></math></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">you will notice that these two objectives are identical, except that whereas in sparse coding, we are trying to learn a matrix <math>A</math> that maps the feature space to the input space, in reconstruction ICA, we are trying to learn a matrix <math>W</math> that maps the input space to the feature space instead. </del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">As such, it is obvious that reconstruction ICA does not learn independent or orthonormal components.</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Topographic ICA ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Topographic ICA ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=1002&oldid=prevCyfoo at 00:47, 18 June 20112011-06-18T00:47:00Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:47, 18 June 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 22:</td>
<td colspan="2" class="diff-lineno">Line 22:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Independent Component Analysis#Orthonormal ICA | Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) . In other situations, the orthonormality constraint is replaced by a reconstruction cost instead to give [[Independent Component Analysis#Reconstruction ICA | Reconstruction ICA]], which is very similar to [[Sparse Coding: Autoencoder Interpretation | sparse coding]], but no longer strictly finds independent components.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Independent Component Analysis#Orthonormal ICA | Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) . In other situations, the orthonormality constraint is replaced by a reconstruction cost instead to give [[Independent Component Analysis#Reconstruction ICA | Reconstruction ICA]], which is very similar to [[Sparse Coding: Autoencoder Interpretation | sparse coding]], but no longer strictly finds independent components.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">=</del>== Orthonormal ICA <del class="diffchange diffchange-inline">=</del>==</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>== Orthonormal ICA ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The orthonormal ICA objective is:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The orthonormal ICA objective is:</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 50:</td>
<td colspan="2" class="diff-lineno">Line 50:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-\frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-\frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">=</del>== Reconstruction ICA <del class="diffchange diffchange-inline">=</del>==</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>== Reconstruction ICA ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In reconstruction ICA, we drop the constraint that we want an orthonormal basis, replacing it with a reconstruction error term. Hence, the new reconstruction ICA objective is:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In reconstruction ICA, we drop the constraint that we want an orthonormal basis, replacing it with a reconstruction error term. Hence, the new reconstruction ICA objective is:</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 67:</td>
<td colspan="2" class="diff-lineno">Line 67:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>As such, it is obvious that reconstruction ICA does not learn independent or orthonormal components.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>As such, it is obvious that reconstruction ICA does not learn independent or orthonormal components.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">=</del>== Topographic ICA <del class="diffchange diffchange-inline">=</del>==</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>== Topographic ICA ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Just like [[Sparse Coding: Autoencoder Interpretation | sparse coding]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div></td></tr>
</table>Cyfoohttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=1001&oldid=prevCyfoo at 00:46, 18 June 20112011-06-18T00:46:38Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:46, 18 June 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 20:</td>
<td colspan="2" class="diff-lineno">Line 20:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint). </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint). </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[<del class="diffchange diffchange-inline">Orthonormal ICA | </del>Independent Component Analysis#Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) . In other situations, the orthonormality constraint is replaced by a reconstruction cost instead to give [[<del class="diffchange diffchange-inline">Reconstruction ICA | </del>Independent Component Analysis#Reconstruction ICA]], which is very similar to [[<del class="diffchange diffchange-inline">sparse coding | </del>Sparse Coding: Autoencoder Interpretation]], but no longer strictly finds independent components.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Independent Component Analysis#<ins class="diffchange diffchange-inline">Orthonormal ICA | </ins>Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) . In other situations, the orthonormality constraint is replaced by a reconstruction cost instead to give [[Independent Component Analysis#<ins class="diffchange diffchange-inline">Reconstruction ICA | </ins>Reconstruction ICA]], which is very similar to [[Sparse Coding: Autoencoder Interpretation <ins class="diffchange diffchange-inline">| sparse coding</ins>]], but no longer strictly finds independent components.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== Orthonormal ICA ===</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== Orthonormal ICA ===</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 34:</td>
<td colspan="2" class="diff-lineno">Line 34:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Observe that the constraint <math>WW^T = I</math> implies two other constraints. </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Observe that the constraint <math>WW^T = I</math> implies two other constraints. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Firstly, since we are learning an orthonormal basis, the number of basis vectors we learn must be less than the dimension of the input. In particular, this means that we cannot learn over-complete bases as we usually do in [[<del class="diffchange diffchange-inline">sparse coding | </del>Sparse Coding: Autoencoder Interpretation]]. </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Firstly, since we are learning an orthonormal basis, the number of basis vectors we learn must be less than the dimension of the input. In particular, this means that we cannot learn over-complete bases as we usually do in [[Sparse Coding: Autoencoder Interpretation <ins class="diffchange diffchange-inline">| sparse coding</ins>]]. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Secondly, the data must be [[ZCA whitened <del class="diffchange diffchange-inline">| Whitening</del>]] with no regularization (that is, with <math>\epsilon</math> set to 0). ([[TODO]] Why must this be so?)</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Secondly, the data must be [[<ins class="diffchange diffchange-inline">Whitening | </ins>ZCA whitened]] with no regularization (that is, with <math>\epsilon</math> set to 0). ([[TODO]] Why must this be so?)</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Hence, before we even begin to optimize for the orthonormal ICA objective, we must ensure that our data has been '''whitened''', and that we are learning an '''under-complete''' basis. </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Hence, before we even begin to optimize for the orthonormal ICA objective, we must ensure that our data has been '''whitened''', and that we are learning an '''under-complete''' basis. </div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 42:</td>
<td colspan="2" class="diff-lineno">Line 42:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Following that, to optimize for the objective, we can use gradient descent, interspersing gradient descent steps with projection steps to enforce the orthonormality constraint. Hence, the procedure will be as follows:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Following that, to optimize for the objective, we can use gradient descent, interspersing gradient descent steps with projection steps to enforce the orthonormality constraint. Hence, the procedure will be as follows:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"><ol></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Repeat until done:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Repeat until done:</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"><ol></ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><li><math>W \leftarrow W - \alpha \nabla_W \lVert Wx \rVert_1</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><li><math>W \leftarrow W - \alpha \nabla_W \lVert Wx \rVert_1</math></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><li><math>W \leftarrow <del class="diffchange diffchange-inline">proj_U </del>W</math> where <math>U</math> is the space of matrices satisfying <math>WW^T = I</math></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><li><math>W \leftarrow <ins class="diffchange diffchange-inline">\operatorname{proj}_U </ins>W</math> where <math>U</math> is the space of matrices satisfying <math>WW^T = I</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></ol></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></ol></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-<ins class="diffchange diffchange-inline">\</ins>frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== Reconstruction ICA ===</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== Reconstruction ICA ===</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 55:</td>
<td colspan="2" class="diff-lineno">Line 55:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>:<math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>:<math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\begin{array}{rcl}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\begin{array}{rcl}</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> {\rm minimize} & \lVert Wx \rVert_1 + \<del class="diffchange diffchange-inline">lvert </del>W^TWx \rVert_2^2 \\</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> {\rm minimize} & \lVert Wx \rVert_1 + \<ins class="diffchange diffchange-inline">lVert </ins>W^TWx \rVert_2^2 \\</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{array} </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{array} </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>where <math>W^T(Wx)<math> is the reconstruction of the input from the features (i.e. <math>W^T</math> is supposed to play the role of <math>W^{-1}</math). If you recall the [[<del class="diffchange diffchange-inline">sparse coding | </del>Sparse Coding: Autoencoder Interpretation]] objective,</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>where <math>W^T(Wx)<<ins class="diffchange diffchange-inline">/</ins>math> is the reconstruction of the input from the features (i.e. <math>W^T</math> is supposed to play the role of <math>W^{-1}</math<ins class="diffchange diffchange-inline">></ins>). If you recall the [[Sparse Coding: Autoencoder Interpretation <ins class="diffchange diffchange-inline">| sparse coding</ins>]] objective,</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>:<math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>:<math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>J(A, s) = \lVert As - x \rVert_2^2 + \lambda \lVert s \rVert_1</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>J(A, s) = \lVert As - x \rVert_2^2 + \lambda \lVert s \rVert_1</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 69:</td>
<td colspan="2" class="diff-lineno">Line 69:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== Topographic ICA ===</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== Topographic ICA ===</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Just like [[<del class="diffchange diffchange-inline">sparse coding | </del>Sparse Coding: Autoencoder Interpretation]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Just like [[Sparse Coding: Autoencoder Interpretation <ins class="diffchange diffchange-inline">| sparse coding</ins>]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term.</div></td></tr>
</table>Cyfoohttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=1000&oldid=prevCyfoo: /* Introduction */2011-06-18T00:42:13Z<p><span class="autocomment">Introduction</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:42, 18 June 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are '''sparse'''; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are '''sparse'''; and secondly, our basis is an <ins class="diffchange diffchange-inline">'''</ins>orthonormal<ins class="diffchange diffchange-inline">''' </ins>basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>:<math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>:<math></div></td></tr>
</table>Cyfoohttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=999&oldid=prevCyfoo at 00:41, 18 June 20112011-06-18T00:41:52Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:41, 18 June 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are <ins class="diffchange diffchange-inline">'''</ins>sparse<ins class="diffchange diffchange-inline">'''</ins>; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>:<math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>:<math></div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 18:</td>
<td colspan="2" class="diff-lineno">Line 18:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint).</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint)<ins class="diffchange diffchange-inline">. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">In practice, optimizing for the objective function while enforcing the orthonormality constraint (as described in [[Orthonormal ICA | Independent Component Analysis#Orthonormal ICA]] section below) is feasible but slow. Hence, the use of orthonormal ICA is limited to situations where it is important to obtain an orthonormal basis ([[TODO]]: what situations) . In other situations, the orthonormality constraint is replaced by a reconstruction cost instead to give [[Reconstruction ICA | Independent Component Analysis#Reconstruction ICA]], which is very similar to [[sparse coding | Sparse Coding: Autoencoder Interpretation]], but no longer strictly finds independent components.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">=== Orthonormal ICA ===</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">The orthonormal ICA objective is:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">:<math></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">\begin{array}{rcl}</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> {\rm minimize} & \lVert Wx \rVert_1 \\</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> {\rm s.t.} & WW^T = I \\</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">\end{array} </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"></math></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Observe that the constraint <math>WW^T = I</math> implies two other constraints. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Firstly, since we are learning an orthonormal basis, the number of basis vectors we learn must be less than the dimension of the input. In particular, this means that we cannot learn over-complete bases as we usually do in [[sparse coding | Sparse Coding: Autoencoder Interpretation]]. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Secondly, the data must be [[ZCA whitened | Whitening]] with no regularization (that is, with <math>\epsilon</math> set to 0). ([[TODO]] Why must this be so?)</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Hence, before we even begin to optimize for the orthonormal ICA objective, we must ensure that our data has been '''whitened''', and that we are learning an '''under-complete''' basis. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Following that, to optimize for the objective, we can use gradient descent, interspersing gradient descent steps with projection steps to enforce the orthonormality constraint. Hence, the procedure will be as follows:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"><ol></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Repeat until done:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"><li><math>W \leftarrow W - \alpha \nabla_W \lVert Wx \rVert_1</math></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"><li><math>W \leftarrow proj_U W</math> where <math>U</math> is the space of matrices satisfying <math>WW^T = I</math></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"></ol></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">In practice, the learning rate <math>\alpha</math> is varied using a line-search algorithm to speed up the descent, and the projection step is achieved by setting <math>W \leftarrow (WW^T)^{-frac{1}{2}} W</math>, which can actually be seen as ZCA whitening ([[TODO]] explain how it is like ZCA whitening).</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">=== Reconstruction ICA ===</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">In reconstruction ICA, we drop the constraint that we want an orthonormal basis, replacing it with a reconstruction error term. Hence, the new reconstruction ICA objective is:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">:<math></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">\begin{array}{rcl}</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> {\rm minimize} & \lVert Wx \rVert_1 + \lvert W^TWx \rVert_2^2 \\</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">\end{array} </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"></math></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">where <math>W^T(Wx)<math> is the reconstruction of the input from the features (i.e. <math>W^T</math> is supposed to play the role of <math>W^{-1}</math). If you recall the [[sparse coding | Sparse Coding: Autoencoder Interpretation]] objective,</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">:<math></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">J(A, s) = \lVert As - x \rVert_2^2 + \lambda \lVert s \rVert_1</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"></math></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">you will notice that these two objectives are identical, except that whereas in sparse coding, we are trying to learn a matrix <math>A</math> that maps the feature space to the input space, in reconstruction ICA, we are trying to learn a matrix <math>W</math> that maps the input space to the feature space instead. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">As such, it is obvious that reconstruction ICA does not learn independent or orthonormal components.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">=== Topographic ICA ===</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Just like [[sparse coding | Sparse Coding: Autoencoder Interpretation]], independent component analysis can be modified to give a topographic variant by adding a topographic cost term</ins>.</div></td></tr>
</table>Cyfoohttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=993&oldid=prevCyfoo: /* Introduction */2011-05-30T07:24:27Z<p><span class="autocomment">Introduction</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 07:24, 30 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Introduction ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Introduction ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do <del class="diffchange diffchange-inline">- </del>to learn not any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do<ins class="diffchange diffchange-inline">. Further, in ICA, we want </ins>to learn not <ins class="diffchange diffchange-inline">just </ins>any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:</div></td></tr>
</table>Cyfoohttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=992&oldid=prevCyfoo: /* Introduction */2011-05-30T07:23:50Z<p><span class="autocomment">Introduction</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 07:23, 30 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Introduction ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Introduction ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do - to learn not any linearly independent basis, but an orthonormal basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do - to learn not any linearly independent basis, but an <ins class="diffchange diffchange-inline">'''</ins>orthonormal<ins class="diffchange diffchange-inline">''' </ins>basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:</div></td></tr>
</table>Cyfoohttp://deeplearning.stanford.edu/wiki/index.php?title=Independent_Component_Analysis&diff=991&oldid=prevCyfoo: Created page with "== Introduction == If you recall, in sparse coding, we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis ..."2011-05-30T07:23:25Z<p>Created page with "== Introduction == If you recall, in <a href="/wiki/index.php/Sparse_Coding" title="Sparse Coding"> sparse coding</a>, we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis ..."</p>
<p><b>New page</b></p><div>== Introduction ==<br />
<br />
If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do - to learn not any linearly independent basis, but an orthonormal basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>).<br />
<br />
Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function:<br />
<br />
:<math><br />
J(W) = \lVert Wx \rVert_1 <br />
</math><br />
<br />
This objective function is equivalent to the sparsity penalty on the features <math>s</math> in sparse coding, since <math>Wx</math> is precisely the features that represent the data. Adding in the orthonormality constraint gives us the full optimization problem for independent component analysis:<br />
<br />
:<math><br />
\begin{array}{rcl}<br />
{\rm minimize} & \lVert Wx \rVert_1 \\<br />
{\rm s.t.} & WW^T = I \\<br />
\end{array} <br />
</math><br />
<br />
As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint).</div>Cyfoo