http://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&feed=atom&action=historyFine-tuning Stacked AEs - Revision history2024-03-28T08:26:43ZRevision history for this page on the wikiMediaWiki 1.16.2http://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=2303&oldid=prevKandeng at 04:04, 8 April 20132013-04-08T04:04:28Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 04:04, 8 April 2013</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 39:</td>
<td colspan="2" class="diff-lineno">Line 39:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{CNN}}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{CNN}}</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{Languages|微调多层自编码算法|中文}}</ins></div></td></tr>
</table>Kandenghttp://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=902&oldid=prevWatsuen at 11:04, 26 May 20112011-05-26T11:04:23Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 11:04, 26 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 36:</td>
<td colspan="2" class="diff-lineno">Line 36:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>, where <math>\nabla J = \theta^T(I-P)</math>.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>, where <math>\nabla J = \theta^T(I-P)</math>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>}}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>}}</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{CNN}}</ins></div></td></tr>
</table>Watsuenhttp://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=735&oldid=prevJngiam: /* Finetuning with Backpropagation */2011-05-12T03:22:07Z<p><span class="autocomment">Finetuning with Backpropagation</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 03:22, 12 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 34:</td>
<td colspan="2" class="diff-lineno">Line 34:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{Quote|</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{Quote|</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives (in Step 2) are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)}<ins class="diffchange diffchange-inline">)</math>, where <math>\nabla J = \theta^T(I-P</ins>)</math>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>}}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>}}</div></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=732&oldid=prevJngiam: /* Finetuning for Classification Error */2011-05-12T00:29:35Z<p><span class="autocomment">Finetuning for Classification Error</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:29, 12 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 5:</td>
<td colspan="2" class="diff-lineno">Line 5:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>=== Finetuning <del class="diffchange diffchange-inline">for Classification Error </del>===</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>=== Finetuning <ins class="diffchange diffchange-inline">with Backpropagation </ins>===</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For your convenience, the summary of the backpropagation algorithm using element wise notation is below:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For your convenience, the summary of the backpropagation algorithm using element wise notation is below:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=729&oldid=prevJngiam: /* Recap of the Backpropagation Algorithm */2011-05-12T00:25:02Z<p><span class="autocomment">Recap of the Backpropagation Algorithm</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:25, 12 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 5:</td>
<td colspan="2" class="diff-lineno">Line 5:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>=== <del class="diffchange diffchange-inline">Recap of the Backpropagation Algorithm </del>===</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>=== <ins class="diffchange diffchange-inline">Finetuning for Classification Error </ins>===</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For your convenience, the summary of the backpropagation algorithm using element wise notation is below:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For your convenience, the summary of the backpropagation algorithm using element wise notation is below:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=728&oldid=prevJngiam: /* General Strategy */2011-05-12T00:24:37Z<p><span class="autocomment">General Strategy</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:24, 12 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 4:</td>
<td colspan="2" class="diff-lineno">Line 4:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== General Strategy ===</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== General Strategy ===</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Fortunately, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;">Note: most stacked autoencoders don't go past five layers.</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== Recap of the Backpropagation Algorithm ===</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== Recap of the Backpropagation Algorithm ===</div></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=718&oldid=prevJngiam: /* Recap of the Backpropagation Algorithm */2011-05-11T18:46:37Z<p><span class="autocomment">Recap of the Backpropagation Algorithm</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:46, 11 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 36:</td>
<td colspan="2" class="diff-lineno">Line 36:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{Quote|</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{Quote|</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives <ins class="diffchange diffchange-inline">(in Step 2) </ins>are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>}}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>}}</div></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=717&oldid=prevJngiam at 18:46, 11 May 20112011-05-11T18:46:14Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:46, 11 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== General Strategy ===</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>=== General Strategy ===</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">Luckily</del>, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Fortunately</ins>, we already have all the tools necessary to implement fine tuning for stacked autoencoders! In order to compute the gradients for all the layers of the stacked autoencoder in each iteration, we use the [[Backpropagation Algorithm]], as discussed in the sparse autoencoder section. As the backpropagation algorithm can be extended to apply for an arbitrary number of layers, we can actually use this algorithm on a stacked autoencoder of arbitrary depth.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Note: most stacked autoencoders don't go past five layers.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Note: most stacked autoencoders don't go past five layers.</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 32:</td>
<td colspan="2" class="diff-lineno">Line 32:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>&= \left[ \frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)},y^{(i)}) \right]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{Quote|</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">Note: While one could consider the softmax classifier as an additional layer, the derivation above does not. Specifically, we consider the "last layer" of the network to be the features that goes into the softmax classifier. Therefore, the derivatives are computed using <math>\delta^{(n_l)} = - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</math>.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">}}</ins></div></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=370&oldid=prev172.24.69.9 at 23:52, 25 April 20112011-04-25T23:52:04Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 23:52, 25 April 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 16:</td>
<td colspan="2" class="diff-lineno">Line 16:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>= - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>= - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>::(When using softmax regression, the softmax layer has <math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the <del class="diffchange diffchange-inline">predicted labels</del>.)</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>::(When using softmax regression, the softmax layer has <math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the <ins class="diffchange diffchange-inline">vector of conditional probabilities</ins>.)</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>: 3. For <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>: 3. For <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>::Set</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>::Set</div></td></tr>
</table>172.24.69.9http://deeplearning.stanford.edu/wiki/index.php?title=Fine-tuning_Stacked_AEs&diff=336&oldid=prevWatsuen: /* Recap of the Backpropagation Algorithm */2011-04-22T01:33:58Z<p><span class="autocomment">Recap of the Backpropagation Algorithm</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 01:33, 22 April 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 16:</td>
<td colspan="2" class="diff-lineno">Line 16:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>= - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>= - (\nabla_{a^{n_l}}J) \bullet f'(z^{(n_l)})</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>::<del class="diffchange diffchange-inline">For </del>the softmax layer<del class="diffchange diffchange-inline">, we have </del><math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the predicted labels.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>::<ins class="diffchange diffchange-inline">(When using softmax regression, </ins>the softmax layer <ins class="diffchange diffchange-inline">has </ins><math>\nabla J = \theta^T(I-P)</math> where <math>I</math> is the input labels and <math>P</math> is the predicted labels.<ins class="diffchange diffchange-inline">)</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>: 3. For <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>: 3. For <math>\textstyle l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>::Set</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>::Set</div></td></tr>
</table>Watsuen