http://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&feed=atom&action=historySoftmax Regression - Revision history2024-03-28T10:56:34ZRevision history for this page on the wikiMediaWiki 1.16.2http://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=2288&oldid=prevKandeng at 13:24, 7 April 20132013-04-07T13:24:56Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 13:24, 7 April 2013</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 388:</td>
<td colspan="2" class="diff-lineno">Line 388:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{Softmax}}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{Softmax}}</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{Languages|Softmax回归|中文}}</ins></div></td></tr>
</table>Kandenghttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=893&oldid=prevWatsuen at 11:02, 26 May 20112011-05-26T11:02:00Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 11:02, 26 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 385:</td>
<td colspan="2" class="diff-lineno">Line 385:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>classifier would be appropriate. In the second case, it would be more appropriate to build</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>classifier would be appropriate. In the second case, it would be more appropriate to build</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>three separate logistic regression classifiers.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>three separate logistic regression classifiers.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{Softmax}}</ins></div></td></tr>
</table>Watsuenhttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=715&oldid=prevZellyn: /* Properties of softmax regression parameterization */2011-05-11T15:23:16Z<p><span class="autocomment">Properties of softmax regression parameterization</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 15:23, 11 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 202:</td>
<td colspan="2" class="diff-lineno">Line 202:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>regression's parameters are "redundant." More formally, we say that our</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>regression's parameters are "redundant." More formally, we say that our</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>softmax model is '''overparameterized,''' meaning that for any hypothesis we might</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>softmax model is '''overparameterized,''' meaning that for any hypothesis we might</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>fit to the data, there<del class="diffchange diffchange-inline">'re </del>multiple parameter settings that give rise to exactly</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>fit to the data, there <ins class="diffchange diffchange-inline">are </ins>multiple parameter settings that give rise to exactly</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>the same hypothesis function <math>h_\theta</math> mapping from inputs <math>x</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>the same hypothesis function <math>h_\theta</math> mapping from inputs <math>x</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>to the predictions. </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>to the predictions. </div></td></tr>
</table>Zellynhttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=702&oldid=prevJngiam at 00:58, 11 May 20112011-05-11T00:58:34Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:58, 11 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 73:</td>
<td colspan="2" class="diff-lineno">Line 73:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For convenience, we will also write </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For convenience, we will also write </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\theta</math> to denote all the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\theta</math> to denote all the</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>parameters of our model. When you implement softmax regression, is usually</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>parameters of our model. When you implement softmax regression, <ins class="diffchange diffchange-inline">it </ins>is usually</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that</div></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=701&oldid=prevJngiam at 00:58, 11 May 20112011-05-11T00:58:28Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:58, 11 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 73:</td>
<td colspan="2" class="diff-lineno">Line 73:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For convenience, we will also write </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>For convenience, we will also write </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\theta</math> to denote all the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\theta</math> to denote all the</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>parameters of our model. When you implement softmax regression, <del class="diffchange diffchange-inline">is </del>is usually</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>parameters of our model. When you implement softmax regression, is usually</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that</div></td></tr>
</table>Jngiamhttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=677&oldid=prevAng: /* Softmax Regression vs. k Binary Classifiers */2011-05-10T19:10:37Z<p><span class="autocomment">Softmax Regression vs. k Binary Classifiers</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 19:10, 10 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 380:</td>
<td colspan="2" class="diff-lineno">Line 380:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>or three logistic regression classifiers? (ii) Now suppose your classes are</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>or three logistic regression classifiers? (ii) Now suppose your classes are</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>indoor_scene, black_and_white_image, and image_has_people. Would you use softmax</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>indoor_scene, black_and_white_image, and image_has_people. Would you use softmax</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>regression <del class="diffchange diffchange-inline">of </del>multiple logistic regression classifiers?</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>regression <ins class="diffchange diffchange-inline">or </ins>multiple logistic regression classifiers?</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In the first case, the classes are mutually exclusive, so a softmax regression</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>In the first case, the classes are mutually exclusive, so a softmax regression</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>classifier would be appropriate. In the second case, it would be more appropriate to build</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>classifier would be appropriate. In the second case, it would be more appropriate to build</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>three separate logistic regression classifiers.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>three separate logistic regression classifiers.</div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=676&oldid=prevAng: /* Relationship to Logistic Regression */2011-05-10T19:09:05Z<p><span class="autocomment">Relationship to Logistic Regression</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 19:09, 10 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 301:</td>
<td colspan="2" class="diff-lineno">Line 301:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Relationship to Logistic Regression ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Relationship to Logistic Regression ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>In the special case where <math>k = 2</math>, one can <del class="diffchange diffchange-inline">also </del>show that softmax regression reduces to logistic regression.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>In the special case where <math>k = 2</math>, one can show that softmax regression reduces to logistic regression.</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>This shows that softmax regression is a generalization of logistic regression. Concretely, <del class="diffchange diffchange-inline">our </del>hypothesis outputs</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>This shows that softmax regression is a generalization of logistic regression. Concretely, <ins class="diffchange diffchange-inline">when <math>k=2</math>,</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">the softmax regression </ins>hypothesis outputs</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\begin{align}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\begin{align}</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">h</del>(x) &=</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">h_\theta</ins>(x) &=</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\frac{1}{ e^{\theta_1^Tx} + e^{ \theta_2^T x^{(i)} } }</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\frac{1}{ e^{\theta_1^Tx} + e^{ \theta_2^T x^{(i)} } }</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 317:</td>
<td colspan="2" class="diff-lineno">Line 318:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Taking advantage of the fact that this hypothesis</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Taking advantage of the fact that this hypothesis</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>is overparameterized and setting <math>\psi <del class="diffchange diffchange-inline">- </del>=\theta_1</math>,</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>is overparameterized and setting <math>\psi = \theta_1</math>,</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>we can subtract <math>\theta_1</math> from each of the two parameters, giving us</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>we can subtract <math>\theta_1</math> from each of the two parameters, giving us</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 352:</td>
<td colspan="2" class="diff-lineno">Line 353:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>same as logistic regression.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>same as logistic regression.</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del style="color: red; font-weight: bold; text-decoration: none;"></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Softmax Regression vs. k Binary Classifiers ==</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>== Softmax Regression vs. k Binary Classifiers ==</div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=675&oldid=prevAng: /* Weight Decay */2011-05-10T18:45:07Z<p><span class="autocomment">Weight Decay</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:45, 10 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 241:</td>
<td colspan="2" class="diff-lineno">Line 241:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>We will modify the cost function by adding a weight decay term </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>We will modify the cost function by adding a weight decay term </div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><math>\frac{\lambda}{2} \sum_{i=1}^k \sum_{j=<del class="diffchange diffchange-inline">1</del>}^{n<del class="diffchange diffchange-inline">+1</del>} \theta_{ij}^2</math></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><math><ins class="diffchange diffchange-inline">\textstyle </ins>\frac{\lambda}{2} \sum_{i=1}^k \sum_{j=<ins class="diffchange diffchange-inline">0</ins>}^{n} \theta_{ij}^2</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>which penalizes large values of the parameters. Our cost function is now</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>which penalizes large values of the parameters. Our cost function is now</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\begin{align}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\begin{align}</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>J(\theta) = - \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{\theta_j^T x^{(i)}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right]</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>J(\theta) = - <ins class="diffchange diffchange-inline">\frac{1}{m} </ins>\left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac<ins class="diffchange diffchange-inline">{e^</ins>{\theta_j^T x^{(i)<ins class="diffchange diffchange-inline">}</ins>}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right]</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> + \frac{\lambda}{2} \sum_{i} \sum_{j} \theta_{ij}^2</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> + \frac{\lambda}{2} \sum_{i<ins class="diffchange diffchange-inline">=1</ins>}<ins class="diffchange diffchange-inline">^k </ins>\sum_{j<ins class="diffchange diffchange-inline">=0</ins>}<ins class="diffchange diffchange-inline">^n </ins>\theta_{ij}^2</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></math></div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 257:</td>
<td colspan="2" class="diff-lineno">Line 257:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>to converge to the global minimum.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>to converge to the global minimum.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>To <del class="diffchange diffchange-inline">implement these </del>optimization <del class="diffchange diffchange-inline">algorithms</del>, we also need the derivative of this</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>To <ins class="diffchange diffchange-inline">apply an </ins>optimization <ins class="diffchange diffchange-inline">algorithm</ins>, we also need the derivative of this</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>new definition of <math>J(\theta)</math>. One can show that the derivative is:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>new definition of <math>J(\theta)</math>. One can show that the derivative is:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math></div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=674&oldid=prevAng: /* Properties of softmax regression parameterization */2011-05-10T18:41:51Z<p><span class="autocomment">Properties of softmax regression parameterization</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:41, 10 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 185:</td>
<td colspan="2" class="diff-lineno">Line 185:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Softmax regression has an unusual property that it has a "redundant" set of parameters. To explain what this means, </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Softmax regression has an unusual property that it has a "redundant" set of parameters. To explain what this means, </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>suppose we take each of our parameter vectors <math>\theta_j</math>, and subtract some fixed vector <math>\psi</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>suppose we take each of our parameter vectors <math>\theta_j</math>, and subtract some fixed vector <math>\psi</math></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>from it, so that <math>\<del class="diffchange diffchange-inline">theta_1</del></math> is now replaced with <math>\<del class="diffchange diffchange-inline">theta_1 </del>- \psi</math><del class="diffchange diffchange-inline">, </del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>from it, so that <ins class="diffchange diffchange-inline">every </ins><math>\<ins class="diffchange diffchange-inline">theta_j</ins></math> is now replaced with <math>\<ins class="diffchange diffchange-inline">theta_j </ins>- \psi</math> </div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><math>\<del class="diffchange diffchange-inline">theta_2</del></math> <del class="diffchange diffchange-inline">is replaced with <math>\theta_2 - \psi</math>, and so on</del>. Our hypothesis</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">(for every </ins><math><ins class="diffchange diffchange-inline">j=1, </ins>\<ins class="diffchange diffchange-inline">ldots, k</ins></math><ins class="diffchange diffchange-inline">)</ins>. Our hypothesis</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>now estimates the class label probabilities as</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>now estimates the class label probabilities as</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 194:</td>
<td colspan="2" class="diff-lineno">Line 194:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>&= \frac{e^{(\theta_j-\psi)^T x^{(i)}}}{\sum_{l=1}^k e^{ (\theta_l-\psi)^T x^{(i)}}} \\</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>&= \frac{e^{(\theta_j-\psi)^T x^{(i)}}}{\sum_{l=1}^k e^{ (\theta_l-\psi)^T x^{(i)}}} \\</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>&= \frac{e^{\theta_j^T x^{(i)}} e^{-\psi^Tx^{(i)}}}{\sum_{l=1}^k e^{\theta_l^T x^{(i)}} e^{-\psi^Tx^{(i)}}} \\</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>&= \frac{e^{\theta_j^T x^{(i)}} e^{-\psi^Tx^{(i)}}}{\sum_{l=1}^k e^{\theta_l^T x^{(i)}} e^{-\psi^Tx^{(i)}}} \\</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>&= \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T<del class="diffchange diffchange-inline">} </del>x^{(i)}}</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>&= \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)}}<ins class="diffchange diffchange-inline">}.</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></math></div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 207:</td>
<td colspan="2" class="diff-lineno">Line 207:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Further, if the cost function <math>J(\theta)</math> is minimized by some</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Further, if the cost function <math>J(\theta)</math> is minimized by some</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>setting of the parameters <math>(\theta_1, \theta_2,\ldots, \<del class="diffchange diffchange-inline">theta_n</del>)</math>,</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>setting of the parameters <math>(\theta_1, \theta_2,\ldots, \<ins class="diffchange diffchange-inline">theta_k</ins>)</math>,</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>then it is also minimized by <math>(\theta_1 - \psi, \theta_2 - \psi,\ldots,</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>then it is also minimized by <math>(\theta_1 - \psi, \theta_2 - \psi,\ldots,</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>\<del class="diffchange diffchange-inline">theta_n </del>- \psi)</math> for any value of <math>\psi</math>. Thus, the</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>\<ins class="diffchange diffchange-inline">theta_k </ins>- \psi)</math> for any value of <math>\psi</math>. Thus, the</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>minimizer of <math>J(\theta)</math> is not unique. (Interestingly, </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>minimizer of <math>J(\theta)</math> is not unique. (Interestingly, </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>J(\theta)</math> is still convex, and thus gradient descent will</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>J(\theta)</math> is still convex, and thus gradient descent will</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>not run into a local <del class="diffchange diffchange-inline">optimum</del>. But the Hessian is singular/non-invertible,</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>not run into a local <ins class="diffchange diffchange-inline">optima problems</ins>. But the Hessian is singular/non-invertible,</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>which <del class="diffchange diffchange-inline">cause </del>a straightforward implementation of Newton's method to run into</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>which <ins class="diffchange diffchange-inline">causes </ins>a straightforward implementation of Newton's method to run into</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>numerical problems.) </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>numerical problems.) </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 221:</td>
<td colspan="2" class="diff-lineno">Line 221:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>of parameters <math>\theta_1</math> (or any other <math>\theta_j</math>, for</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>of parameters <math>\theta_1</math> (or any other <math>\theta_j</math>, for</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>any single value of <math>j</math>), without harming the representational power</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>any single value of <math>j</math>), without harming the representational power</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>of our hypothesis. Indeed, rather than optimizing over the <math><del class="diffchange diffchange-inline">kn</del></math></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>of our hypothesis. Indeed, rather than optimizing over the <math><ins class="diffchange diffchange-inline">k(n+1)</ins></math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>parameters <math>(\theta_1, \theta_2,\ldots, \theta_k)</math> (where</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>parameters <math>(\theta_1, \theta_2,\ldots, \theta_k)</math> (where</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><math>\theta_j \in \Re^{n+1}</math>), one could <del class="diffchange diffchange-inline">indeed </del>set <math>\theta_1 =</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><math>\theta_j \in \Re^{n+1}</math>), one could <ins class="diffchange diffchange-inline">instead </ins>set <math>\theta_1 =</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\vec{0}</math> and optimize only with respect to the <math>(k-1)(n+1)</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\vec{0}</math> and optimize only with respect to the <math>(k-1)(n+1)</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>remaining parameters, and this would work fine. </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>remaining parameters, and this would work fine. </div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Softmax_Regression&diff=673&oldid=prevAng: /* Properties of softmax regression parameterization */2011-05-10T18:37:36Z<p><span class="autocomment">Properties of softmax regression parameterization</span></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:37, 10 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 185:</td>
<td colspan="2" class="diff-lineno">Line 185:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Softmax regression has an unusual property that it has a "redundant" set of parameters. To explain what this means, </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Softmax regression has an unusual property that it has a "redundant" set of parameters. To explain what this means, </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>suppose we take each of our parameter vectors <math>\theta_j</math>, and subtract some fixed vector <math>\psi</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>suppose we take each of our parameter vectors <math>\theta_j</math>, and subtract some fixed vector <math>\psi</math></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>from it, so that <math>\<del class="diffchange diffchange-inline">theta_j</del></math> is now replaced with <math>\<del class="diffchange diffchange-inline">theta_j </del>- \psi</math>. Our hypothesis</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>from it, so that <math>\<ins class="diffchange diffchange-inline">theta_1</ins></math> is now replaced with <math>\<ins class="diffchange diffchange-inline">theta_1 </ins>- \psi</math><ins class="diffchange diffchange-inline">, </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"><math>\theta_2</math> is replaced with <math>\theta_2 - \psi</math>, and so on</ins>. Our hypothesis</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>now estimates the class label probabilities as</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>now estimates the class label probabilities as</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
</table>Ang