http://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&feed=atom&action=historyDeep Networks: Overview - Revision history2024-03-28T17:46:54ZRevision history for this page on the wikiMediaWiki 1.16.2http://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=2294&oldid=prevKandeng at 13:31, 7 April 20132013-04-07T13:31:03Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 13:31, 7 April 2013</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 191:</td>
<td colspan="2" class="diff-lineno">Line 191:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>[http://jmlr.csail.mit.edu/proceedings/papers/v9/erhan10a/erhan10a.pdf]</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>[http://jmlr.csail.mit.edu/proceedings/papers/v9/erhan10a/erhan10a.pdf]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>!--></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>!--></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{Languages|深度网络概览|中文}}</ins></div></td></tr>
</table>Kandenghttp://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=900&oldid=prevWatsuen at 11:03, 26 May 20112011-05-26T11:03:52Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 11:03, 26 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 175:</td>
<td colspan="2" class="diff-lineno">Line 175:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{CNN}}</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><!--</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><!--</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 190:</td>
<td colspan="2" class="diff-lineno">Line 191:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>[http://jmlr.csail.mit.edu/proceedings/papers/v9/erhan10a/erhan10a.pdf]</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>[http://jmlr.csail.mit.edu/proceedings/papers/v9/erhan10a/erhan10a.pdf]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>!--></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>!--></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">{{CNN}}</del></div></td><td colspan="2"> </td></tr>
</table>Watsuenhttp://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=899&oldid=prevWatsuen at 11:03, 26 May 20112011-05-26T11:03:37Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 11:03, 26 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 190:</td>
<td colspan="2" class="diff-lineno">Line 190:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>[http://jmlr.csail.mit.edu/proceedings/papers/v9/erhan10a/erhan10a.pdf]</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>[http://jmlr.csail.mit.edu/proceedings/papers/v9/erhan10a/erhan10a.pdf]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>!--></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>!--></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{CNN}}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{CNN}}</div></td></tr>
</table>Watsuenhttp://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=898&oldid=prevWatsuen at 11:03, 26 May 20112011-05-26T11:03:29Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 11:03, 26 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 190:</td>
<td colspan="2" class="diff-lineno">Line 190:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>[http://jmlr.csail.mit.edu/proceedings/papers/v9/erhan10a/erhan10a.pdf]</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>[http://jmlr.csail.mit.edu/proceedings/papers/v9/erhan10a/erhan10a.pdf]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>!--></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>!--></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{CNN}}</ins></div></td></tr>
</table>Watsuenhttp://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=781&oldid=prev216.239.45.4: /* Difficulty of training deep architectures */2011-05-18T18:07:46Z<p><span class="autocomment">Difficulty of training deep architectures</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:07, 18 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 71:</td>
<td colspan="2" class="diff-lineno">Line 71:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The main learning algorithm that researchers were using was to randomly initialize</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The main learning algorithm that researchers were using was to randomly initialize</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>the weights of a deep network, and then train it using a labeled</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>the weights of a deep network, and then train it using a labeled</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>training set <math>\{ (x^{(1)}_l, y^{(1}), \ldots, (x^{(m_l)}_l, y^{(m_l)}) \}</math></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>training set <math>\{ (x^{(1)}_l, y^{(1<ins class="diffchange diffchange-inline">)</ins>}), \ldots, (x^{(m_l)}_l, y^{(m_l)}) \}</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>using a supervised learning objective, for example by applying gradient descent to try to</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>using a supervised learning objective, for example by applying gradient descent to try to</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>drive down the training error. However, this usually did not work well.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>drive down the training error. However, this usually did not work well.</div></td></tr>
</table>216.239.45.4http://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=767&oldid=prevAng: /* Greedy layer-wise training */2011-05-13T20:40:42Z<p><span class="autocomment">Greedy layer-wise training</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 20:40, 13 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 128:</td>
<td colspan="2" class="diff-lineno">Line 128:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Greedy layer-wise training ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Greedy layer-wise training ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>How <del class="diffchange diffchange-inline">should </del>deep <del class="diffchange diffchange-inline">architectures be trained then</del>? One method that has seen some</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>How <ins class="diffchange diffchange-inline">can we train a </ins>deep <ins class="diffchange diffchange-inline">network</ins>? One method that has seen some</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>success is the '''greedy layer-wise training''' method. We describe this</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>success is the '''greedy layer-wise training''' method. We describe this</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>method in detail in later sections, but briefly, the main idea is to train the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>method in detail in later sections, but briefly, the main idea is to train the</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>layers of the network one at a time, with <del class="diffchange diffchange-inline">the input of each </del>layer <del class="diffchange diffchange-inline">being </del>the</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>layers of the network one at a time, <ins class="diffchange diffchange-inline">so that we first train a network </ins>with <ins class="diffchange diffchange-inline">1 </ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">output of </del>the previous layer <del class="diffchange diffchange-inline">(which has been </del>trained). Training can either be</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">hidden </ins>layer<ins class="diffchange diffchange-inline">, and only after that is done, train a network with 2 hidden layers,</ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>supervised (say, with classification error as the objective function), <del class="diffchange diffchange-inline">or</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">and so on. At each step, we take </ins>the <ins class="diffchange diffchange-inline">old network with <math>k-1</math> hidden</ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>unsupervised (<del class="diffchange diffchange-inline">say, with the error of the layer in reconstructing its input as</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">layers, and add an additional <math>k</math>-th hidden layer (that takes as </ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">the objective function, </del>as in an autoencoder). The weights from training the</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">input </ins>the previous <ins class="diffchange diffchange-inline">hidden </ins>layer <ins class="diffchange diffchange-inline"><math>k-1</math> that we had just</ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>layers individually are then used to initialize the weights in the deep</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>trained). Training can either be </div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">architecture</del>, and only then is the entire architecture "fine-tuned" (i.e.,</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>supervised (say, with classification error as the objective function <ins class="diffchange diffchange-inline">on each</ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>trained together to optimize the training set error). <del class="diffchange diffchange-inline"> </del>The success of greedy</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">step</ins>), <ins class="diffchange diffchange-inline">but more frequently it is </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>unsupervised (as in an autoencoder<ins class="diffchange diffchange-inline">; details to provided later</ins>). </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>The weights from training the layers individually are then used to initialize the weights </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>in the <ins class="diffchange diffchange-inline">final/overall </ins>deep <ins class="diffchange diffchange-inline">network</ins>, and only then is the entire architecture "fine-tuned" (i.e.,</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>trained together to optimize the <ins class="diffchange diffchange-inline">labeled </ins>training set error). </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>The success of greedy</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>layer-wise training has been attributed to a number of factors:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>layer-wise training has been attributed to a number of factors:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 150:</td>
<td colspan="2" class="diff-lineno">Line 156:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>classification layer that maps to the outputs/predictions), our algorithm is</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>classification layer that maps to the outputs/predictions), our algorithm is</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>able to learn and discover patterns from massively more amounts of data than</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>able to learn and discover patterns from massively more amounts of data than</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>purely supervised approaches<del class="diffchange diffchange-inline">, and thus </del>often results in much better <del class="diffchange diffchange-inline">hypotheses</del>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>purely supervised approaches<ins class="diffchange diffchange-inline">. This </ins>often results in much better <ins class="diffchange diffchange-inline">classifiers </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">being learned</ins>. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>===<del class="diffchange diffchange-inline">Regularization and better </del>local optima=== </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>===<ins class="diffchange diffchange-inline">Better </ins>local optima=== </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>After having trained the network</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>After having trained the network</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>on the unlabeled data, the weights are now starting at a better location in</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>on the unlabeled data, the weights are now starting at a better location in</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>parameter space than if they had been randomly initialized. We <del class="diffchange diffchange-inline">usually </del>then</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>parameter space than if they had been randomly initialized. We <ins class="diffchange diffchange-inline">can </ins>then</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>further fine-tune the weights starting from this location. Empirically, it</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>further fine-tune the weights starting from this location. Empirically, it</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>turns out that gradient descent from this location is <del class="diffchange diffchange-inline">also </del>much more likely to</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>turns out that gradient descent from this location is much more likely to</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>lead to a good local minimum, because the unlabeled data has already provided</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>lead to a good local minimum, because the unlabeled data has already provided</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>a significant amount of "prior" information about what patterns there</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>a significant amount of "prior" information about what patterns there</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>are in the input data.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>are in the input data. </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In the next section, we will describe the specific details of how to go about</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In the next section, we will describe the specific details of how to go about</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>implementing greedy layer-wise training.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>implementing greedy layer-wise training. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=766&oldid=prevAng: /* Difficulty of training deep architectures */2011-05-13T20:33:32Z<p><span class="autocomment">Difficulty of training deep architectures</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 20:33, 13 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 71:</td>
<td colspan="2" class="diff-lineno">Line 71:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The main learning algorithm that researchers were using was to randomly initialize</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>The main learning algorithm that researchers were using was to randomly initialize</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>the weights of a deep network, and then train it using a labeled</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>the weights of a deep network, and then train it using a labeled</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>training set <math>\{ (x^{(1)}_l, y^{(1}), \ldots, (x^{(m_l)}_l, y^{(m_l}) \}</math></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>training set <math>\{ (x^{(1)}_l, y^{(1}), \ldots, (x^{(m_l)}_l, y^{(m_l<ins class="diffchange diffchange-inline">)</ins>}) \}</math></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>using a supervised learning objective, <del class="diffchange diffchange-inline">using </del>gradient descent to try to</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>using a supervised learning objective, <ins class="diffchange diffchange-inline">for example by applying </ins>gradient descent to try to</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>drive down the training error. However, this usually did not work well.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>drive down the training error. However, this usually did not work well.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>There were several reasons for this.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>There were several reasons for this.</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 79:</td>
<td colspan="2" class="diff-lineno">Line 79:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>With the method described above, one relies only on</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>With the method described above, one relies only on</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>labeled data for training. However, labeled data is often scarce, and thus <del class="diffchange diffchange-inline">it</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>labeled data for training. However, labeled data is often scarce, and thus <ins class="diffchange diffchange-inline">for many</ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>is <del class="diffchange diffchange-inline">easy </del>to <del class="diffchange diffchange-inline">overfit </del>the <del class="diffchange diffchange-inline">training data and obtain </del>a model <del class="diffchange diffchange-inline">which does not</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">problems it </ins>is <ins class="diffchange diffchange-inline">difficult </ins>to <ins class="diffchange diffchange-inline">get enough examples to fit </ins>the <ins class="diffchange diffchange-inline">parameters of </ins>a</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">generalize well</del>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">complex </ins>model<ins class="diffchange diffchange-inline">. For example, given the high degree of expressive power of deep networks,</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">training on insufficient data would also result in overfitting</ins>. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>===Local optima=== </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>===Local optima=== </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Training a neural network using supervised learning</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Training <ins class="diffchange diffchange-inline">a shallow network (with 1 hidden layer) using</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">supervised learning usually resulted in the parameters converging to reasonable values;</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">but when we are training a deep network, this works much less well. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">In particular, training </ins>a neural network using supervised learning</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>involves solving a highly non-convex optimization problem (say, minimizing the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>involves solving a highly non-convex optimization problem (say, minimizing the</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training error <math>\textstyle \sum_i ||h_W(x^{(i)}) - y^{(i)}||^2</math> as a</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training error <math>\textstyle \sum_i ||h_W(x^{(i)}) - y^{(i)}||^2</math> as a</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>function of the network parameters <math>\textstyle W</math>). <del class="diffchange diffchange-inline"> When the</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>function of the network parameters <math>\textstyle W</math>). <ins class="diffchange diffchange-inline"> </ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">network is </del>deep, this <del class="diffchange diffchange-inline">optimization </del>problem <del class="diffchange diffchange-inline">is </del>rife with bad local optima, and</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">In a </ins>deep <ins class="diffchange diffchange-inline">network</ins>, this problem <ins class="diffchange diffchange-inline">turns out to be </ins>rife with bad local optima, and</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training with gradient descent (or methods like conjugate gradient and L-BFGS)</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training with gradient descent (or methods like conjugate gradient and L-BFGS)</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">do not </del>work well.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">no longer </ins>work well. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>===Diffusion of gradients=== </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>===Diffusion of gradients=== </div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 97:</td>
<td colspan="2" class="diff-lineno">Line 101:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>There is an additional technical reason,</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>There is an additional technical reason,</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>pertaining to the gradients becoming very small, that explains why gradient</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>pertaining to the gradients becoming very small, that explains why gradient</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>descent (and related algorithms like L-BFGS) do not work well on a deep <del class="diffchange diffchange-inline">network</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>descent (and related algorithms like L-BFGS) do not work well on a deep <ins class="diffchange diffchange-inline">networks</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>with randomly initialized weights. Specifically, when using backpropagation to</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>with randomly initialized weights. Specifically, when using backpropagation to</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>compute the derivatives, the gradients that are propagated backwards (from the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>compute the derivatives, the gradients that are propagated backwards (from the</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>output layer to the earlier layers of the network) rapidly <del class="diffchange diffchange-inline">diminishes </del>in</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>output layer to the earlier layers of the network) rapidly <ins class="diffchange diffchange-inline">diminish </ins>in</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>magnitude as the depth of the network increases. As a result, the derivative of</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>magnitude as the depth of the network increases. As a result, the derivative of</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>the overall cost with respect to the weights in the earlier layers is very</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>the overall cost with respect to the weights in the earlier layers is very</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 113:</td>
<td colspan="2" class="diff-lineno">Line 117:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>randomly initialized ends up giving similar performance to training a</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>randomly initialized ends up giving similar performance to training a</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>shallow network (the last few layers) on corrupted input (the result of</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>shallow network (the last few layers) on corrupted input (the result of</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>the processing done by the earlier layers).</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>the processing done by the earlier layers). </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><!--</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><!--</div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=765&oldid=prevAng: /* Difficulty of training deep architectures */2011-05-13T20:25:20Z<p><span class="autocomment">Difficulty of training deep architectures</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 20:25, 13 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 69:</td>
<td colspan="2" class="diff-lineno">Line 69:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>researchers had little success training deep architectures.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>researchers had little success training deep architectures.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>The main <del class="diffchange diffchange-inline">method </del>that researchers were using was to randomly initialize</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>The main <ins class="diffchange diffchange-inline">learning algorithm </ins>that researchers were using was to randomly initialize</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>the weights of <del class="diffchange diffchange-inline">the </del>deep network, and then train it using a labeled</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>the weights of <ins class="diffchange diffchange-inline">a </ins>deep network, and then train it using a labeled</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training set <math>\{ (x^{(1)}_l, y^{(1}), \ldots, (x^{(m_l)}_l, y^{(m_l}) \}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training set <math>\{ (x^{(1)}_l, y^{(1}), \ldots, (x^{(m_l)}_l, y^{(m_l}) \}</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>using a supervised learning objective, using gradient descent to try to</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>using a supervised learning objective, using gradient descent to try to</div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=764&oldid=prevAng: /* Advantages of deep networks */2011-05-13T20:24:38Z<p><span class="autocomment">Advantages of deep networks</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 20:24, 13 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 42:</td>
<td colspan="2" class="diff-lineno">Line 42:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>By using a deep network, in the case of images, one can also start to learn part-whole decompositions.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>By using a deep network, in the case of images, one can also start to learn part-whole decompositions.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For example, the first layer might learn to group together pixels in an image</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For example, the first layer might learn to group together pixels in an image</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>in order to detect edges. The second layer might then group together edges to</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>in order to detect edges <ins class="diffchange diffchange-inline">(as seen in the earlier exercises)</ins>. The second layer might then group together edges to</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>detect longer contours, or perhaps simple "<del class="diffchange diffchange-inline">object </del>parts." An even deeper layer</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>detect longer contours, or perhaps <ins class="diffchange diffchange-inline">detect </ins>simple "parts <ins class="diffchange diffchange-inline">of objects</ins>." An even deeper layer</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>might then group together these contours or detect even more complex features.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>might then group together these contours or detect even more complex features.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 49:</td>
<td colspan="2" class="diff-lineno">Line 49:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>processing. For example, visual images are processed in multiple stages by the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>processing. For example, visual images are processed in multiple stages by the</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>brain, by cortical area "V1", followed by cortical area "V2" (a different part</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>brain, by cortical area "V1", followed by cortical area "V2" (a different part</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>of the brain), and so on.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>of the brain), and so on. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><!--</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><!--</div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Deep_Networks:_Overview&diff=763&oldid=prevAng: /* Advantages of deep networks */2011-05-13T20:23:17Z<p><span class="autocomment">Advantages of deep networks</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 20:23, 13 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 40:</td>
<td colspan="2" class="diff-lineno">Line 40:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>n</math>.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>n</math>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>By using a deep network, one can also start to learn part-whole decompositions.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>By using a deep network<ins class="diffchange diffchange-inline">, in the case of images</ins>, one can also start to learn part-whole decompositions.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For example, the first layer might learn to group together pixels in an image</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For example, the first layer might learn to group together pixels in an image</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>in order to detect edges. The second layer might then group together edges to</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>in order to detect edges. The second layer might then group together edges to</div></td></tr>
</table>Ang