http://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&feed=atom&action=historyLogistic Regression Vectorization Example - Revision history2024-03-29T11:24:16ZRevision history for this page on the wikiMediaWiki 1.16.2http://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=2277&oldid=prevKandeng at 13:09, 7 April 20132013-04-07T13:09:53Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 13:09, 7 April 2013</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 73:</td>
<td colspan="2" class="diff-lineno">Line 73:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{Vectorized Implementation}}</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>{{Vectorized Implementation}}</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{Languages|逻辑回归的向量化实现样例|中文}}</ins></div></td></tr>
</table>Kandenghttp://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=884&oldid=prevWatsuen at 10:56, 26 May 20112011-05-26T10:56:58Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 10:56, 26 May 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 70:</td>
<td colspan="2" class="diff-lineno">Line 70:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought. But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought. But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">{{Vectorized Implementation}}</ins></div></td></tr>
</table>Watsuenhttp://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=473&oldid=prevAng at 18:32, 29 April 20112011-04-29T18:32:11Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 18:32, 29 April 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 4:</td>
<td colspan="2" class="diff-lineno">Line 4:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>where (following <del class="diffchange diffchange-inline">CS229 </del>notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math> </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>where (following <ins class="diffchange diffchange-inline">the </ins>notational convention <ins class="diffchange diffchange-inline">from the OpenClassroom videos and from CS229</ins>) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math> </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 10:</td>
<td colspan="2" class="diff-lineno">Line 10:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>[Note: Most of the notation below follows that defined in the class </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>[Note: Most of the notation below follows that defined <ins class="diffchange diffchange-inline">in the OpenClassroom videos or </ins>in the class </div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>CS229: Machine Learning. <del class="diffchange diffchange-inline">Please </del>see Lecture <del class="diffchange diffchange-inline">notes </del>#1 <del class="diffchange diffchange-inline">from </del>http://cs229.stanford.edu/ <del class="diffchange diffchange-inline">for details</del>.]</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>CS229: Machine Learning. <ins class="diffchange diffchange-inline">For details, </ins>see <ins class="diffchange diffchange-inline">either the [http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning OpenClassroom videos] or </ins>Lecture <ins class="diffchange diffchange-inline">Notes </ins>#1 <ins class="diffchange diffchange-inline">of </ins>http://cs229.stanford.edu/ .]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>We thus need to compute the gradient:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>We thus need to compute the gradient:</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 22:</td>
<td colspan="2" class="diff-lineno">Line 22:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the </div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">OpenClassroom/</ins>CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.) </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.) </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 55:</td>
<td colspan="2" class="diff-lineno">Line 55:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>end;</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>end;</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>% Fast implementation of matrix-vector <del class="diffchange diffchange-inline">multiple</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>% Fast implementation of matrix-vector <ins class="diffchange diffchange-inline">multiply</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>grad = A*b;</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>grad = A*b;</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=356&oldid=prevMaiyifan: Added semicolons, deleted stray quote2011-04-23T00:53:25Z<p>Added semicolons, deleted stray quote</p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 00:53, 23 April 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 7:</td>
<td colspan="2" class="diff-lineno">Line 7:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>ascent update rule is <del class="diffchange diffchange-inline">"</del><math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>ascent update rule is <math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 56:</td>
<td colspan="2" class="diff-lineno">Line 56:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Fast implementation of matrix-vector multiple</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Fast implementation of matrix-vector multiple</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>grad = A*b</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>grad = A*b<ins class="diffchange diffchange-inline">;</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 63:</td>
<td colspan="2" class="diff-lineno">Line 63:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Implementation 3</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Implementation 3</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>grad = x * (y- sigmoid(theta'*x))'</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>grad = x * (y- sigmoid(theta'*x))'<ins class="diffchange diffchange-inline">;</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result. The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt> </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result. The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt> </div></td></tr>
</table>Maiyifanhttp://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=82&oldid=prevAng at 22:56, 1 March 20112011-03-01T22:56:26Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 22:56, 1 March 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 18:</td>
<td colspan="2" class="diff-lineno">Line 18:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Suppose that the Matlab/Octave variable <tt>x</tt> is <del class="diffchange diffchange-inline">the design </del>matrix, so that</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Suppose that the Matlab/Octave variable <tt>x</tt> is <ins class="diffchange diffchange-inline">a </ins>matrix <ins class="diffchange diffchange-inline">containing the training inputs</ins>, so that</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math>, and <tt>x(j,i)</tt> is <math>\textstyle x^{(i)}_j</math>. </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math>, and <tt>x(j,i)</tt> is <math>\textstyle x^{(i)}_j</math>. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the </div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>CS229 notation<del class="diffchange diffchange-inline">; specifically</del>, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>CS229 notation<ins class="diffchange diffchange-inline">. Specifically</ins>, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.) </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.) </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 60:</td>
<td colspan="2" class="diff-lineno">Line 60:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>We recognize that Implementation 2 of our gradient descent calculation above is using the slow version with a for-loop, with</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>We recognize that Implementation 2 of our gradient descent calculation above is using the slow version with a for-loop, with</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><tt>b(i)</tt> playing the role of <tt>(y(i) - sigmoid(theta'*x(:,i)))</tt>. We can derive a fast implementation as follows: </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><tt>b(i)</tt> playing the role of <tt>(y(i) - sigmoid(theta'*x(:,i)))<ins class="diffchange diffchange-inline"></tt>, and <tt>A</tt> playing the role of <tt>x</ins></tt>. We can derive a fast implementation as follows: </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Implementation 3</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Implementation 3</div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=79&oldid=prevAng at 19:35, 1 March 20112011-03-01T19:35:30Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 19:35, 1 March 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 4:</td>
<td colspan="2" class="diff-lineno">Line 4:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math> </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 17:</td>
<td colspan="2" class="diff-lineno">Line 17:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\nabla_\theta \ell(\theta) = \sum_{i=1}^m \left(y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}_j.</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\nabla_\theta \ell(\theta) = \sum_{i=1}^m \left(y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}_j.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(<del class="diffchange diffchange-inline">i,</del>j)</tt> is <math>\textstyle x^{(i)}_j</math>. </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math><ins class="diffchange diffchange-inline">, </ins>and <tt>x(j<ins class="diffchange diffchange-inline">,i</ins>)</tt> is <math>\textstyle x^{(i)}_j</math>. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>training set, so that <ins class="diffchange diffchange-inline">the variable </ins><tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the </div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>CS229 notation, <del class="diffchange diffchange-inline">because </del>in <del class="diffchange diffchange-inline">$</del><tt>x</tt> we stack the training inputs in columns rather than in rows;</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>CS229 notation<ins class="diffchange diffchange-inline">; specifically</ins>, in <ins class="diffchange diffchange-inline">the matrix-valued </ins><tt>x</tt> we stack the training inputs in columns rather than in rows;</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row rather than a column vector.) </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row <ins class="diffchange diffchange-inline">vector </ins>rather than a column vector.) </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Here's truly horrible, extremely slow, implementation:</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Here's truly horrible, extremely slow, implementation <ins class="diffchange diffchange-inline">of the gradient computation</ins>:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Implementation 1</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Implementation 1</div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=46&oldid=prevAng at 01:38, 27 February 20112011-02-27T01:38:44Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 01:38, 27 February 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 46:</td>
<td colspan="2" class="diff-lineno">Line 46:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>However, it turns out to be possible to even further vectorize this. <del class="diffchange diffchange-inline">In Matlab/Octave,</del></div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>However, it turns out to be possible to even further vectorize this. <ins class="diffchange diffchange-inline">If we can </ins>get rid of <ins class="diffchange diffchange-inline">the </ins>for-<ins class="diffchange diffchange-inline">loop</ins>, <ins class="diffchange diffchange-inline">we can significantly </ins>speed up the <ins class="diffchange diffchange-inline">implementation</ins>. In particular, <ins class="diffchange diffchange-inline">suppose <tt>b</tt> is a column vector, and <tt>A</tt> is a matrix. Consider </ins>the following <ins class="diffchange diffchange-inline">ways of computing <tt>A * b</tt>: </ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><del class="diffchange diffchange-inline">it is possible to </del>get rid of for-<del class="diffchange diffchange-inline">loops</del>, <del class="diffchange diffchange-inline">and doing so will </del>speed up the <del class="diffchange diffchange-inline">algorithm</del>. In</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"><syntaxhighlight lang="matlab"></ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>particular, <del class="diffchange diffchange-inline">we can implement </del>the following: </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">% Slow implementation of matrix-vector multiply</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">grad = zeros(n+1,1);</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">for i=1:m,</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"> grad = grad + b(i) * A(:,i); % more commonly written A(:,i)*b(i)</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">end;</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">% Fast implementation of matrix-vector multiple</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">grad = A*b</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"></syntaxhighlight></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">We recognize that Implementation 2 of our gradient descent calculation above is using the slow version with a for-loop, with</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline"><tt>b(i)</tt> playing the role of <tt>(y(i) - sigmoid(theta'*x(:,i)))</tt>. We can derive a fast implementation as follows</ins>: </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Implementation 3</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>% Implementation 3</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 55:</td>
<td colspan="2" class="diff-lineno">Line 66:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result. The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt> </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result. The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt> </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>When the training set is large, this final implementation takes the greatest advantage of Matlab/Octave's highly optimized numerical linear algebra libraries to carry out the matrix-vector operations, and so this is far more efficient than the earlier implementations.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>When the training set is large, this final implementation takes the greatest advantage of Matlab/Octave's highly optimized numerical linear algebra libraries to carry out the matrix-vector operations, and so this is far more efficient than the earlier implementations<ins class="diffchange diffchange-inline">. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought. But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code</ins>.</div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=45&oldid=prevAng at 01:15, 27 February 20112011-02-27T01:15:27Z<p></p>
<table style="background-color: white; color:black;">
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr valign='top'>
<td colspan='2' style="background-color: white; color:black;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black;">Revision as of 01:15, 27 February 2011</td>
</tr><tr><td colspan="2" class="diff-lineno">Line 18:</td>
<td colspan="2" class="diff-lineno">Line 18:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>\end{align}</math></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div><tt>x(i<del class="diffchange diffchange-inline">,:</del>)<del class="diffchange diffchange-inline">'</del></tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is <math>\textstyle x^{(i)}_j</math>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><tt>x(<ins class="diffchange diffchange-inline">:,</ins>i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is <math>\textstyle x^{(i)}_j</math>. <ins class="diffchange diffchange-inline"> </ins></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>Further, suppose the Matlab/Octave variable <tt>y</tt> is a vector of the labels in the</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>Further, suppose the Matlab/Octave variable <tt>y</tt> is a <ins class="diffchange diffchange-inline">''row'' </ins>vector of the labels in the</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. <ins class="diffchange diffchange-inline"> (Here we differ from the </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">CS229 notation, because in $<tt>x</tt> we stack the training inputs in columns rather than in rows;</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row rather than a column vector.) </ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Here's truly horrible, extremely slow, implementation:</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>Here's truly horrible, extremely slow, implementation:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">% Implementation 1</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>grad = zeros(n+1,1);</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>grad = zeros(n+1,1);</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>for i=1:m,</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>for i=1:m,</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> h = sigmoid(theta'*x(<del class="diffchange diffchange-inline">i,</del>:)<del class="diffchange diffchange-inline">'</del>);</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> h = sigmoid(theta'*x(:<ins class="diffchange diffchange-inline">,i</ins>));</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div> temp = y(i) - h; </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div> temp = y(i) - h; </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div> for j=1:n+1,</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div> for j=1:n+1,</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> grad(j) = grad(j) + temp * x(<del class="diffchange diffchange-inline">i,</del>j); </div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> grad(j) = grad(j) + temp * x(j<ins class="diffchange diffchange-inline">,i</ins>); </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div> end;</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div> end;</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>end;</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>end;</div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 36:</td>
<td colspan="2" class="diff-lineno">Line 39:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>that partially vectorizes the algorithm and gets better performance: </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>that partially vectorizes the algorithm and gets better performance: </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">% Implementation 2 </ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>grad = zeros(n+1,1);</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>grad = zeros(n+1,1);</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>for i=1:m,</div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>for i=1:m,</div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div> grad = grad + (y(i) - sigmoid(theta'*x(<del class="diffchange diffchange-inline">i,</del>:)<del class="diffchange diffchange-inline">'</del>))* x(<del class="diffchange diffchange-inline">i,</del>:)<del class="diffchange diffchange-inline">'</del>;</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div> grad = grad + (y(i) - sigmoid(theta'*x(:<ins class="diffchange diffchange-inline">,i</ins>)))* x(:<ins class="diffchange diffchange-inline">,i</ins>);</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>end; </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>end; </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td></tr>
<tr><td colspan="2" class="diff-lineno">Line 46:</td>
<td colspan="2" class="diff-lineno">Line 50:</td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>particular, we can implement the following: </div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div>particular, we can implement the following: </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div><syntaxhighlight lang="matlab"></div></td></tr>
<tr><td class='diff-marker'>-</td><td style="background: #ffa; color:black; font-size: smaller;"><div>grad = <del class="diffchange diffchange-inline">X' </del>* (y- sigmoid(<del class="diffchange diffchange-inline">X*</del>theta))</div></td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins class="diffchange diffchange-inline">% Implementation 3</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div>grad = <ins class="diffchange diffchange-inline">x </ins>* (y- sigmoid(theta<ins class="diffchange diffchange-inline">'*x</ins>))<ins class="diffchange diffchange-inline">'</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td><td class='diff-marker'> </td><td style="background: #eee; color:black; font-size: smaller;"><div></syntaxhighlight></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result. The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="background: #cfc; color:black; font-size: smaller;"><div><ins style="color: red; font-weight: bold; text-decoration: none;">When the training set is large, this final implementation takes the greatest advantage of Matlab/Octave's highly optimized numerical linear algebra libraries to carry out the matrix-vector operations, and so this is far more efficient than the earlier implementations.</ins></div></td></tr>
</table>Anghttp://deeplearning.stanford.edu/wiki/index.php?title=Logistic_Regression_Vectorization_Example&diff=42&oldid=prevAng: Created page with "Consider training a logistic regression model using batch gradient ascent. Suppose our hypothesis is :<math>\begin{align} h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, \end{align}<..."2011-02-27T00:22:56Z<p>Created page with "Consider training a logistic regression model using batch gradient ascent. Suppose our hypothesis is :<math>\begin{align} h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, \end{align}<..."</p>
<p><b>New page</b></p><div>Consider training a logistic regression model using batch gradient ascent.<br />
Suppose our hypothesis is<br />
:<math>\begin{align}<br />
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},<br />
\end{align}</math><br />
where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math><br />
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set<br />
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient<br />
ascent update rule is "<math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math><br />
is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.<br />
<br />
[Note: Most of the notation below follows that defined in the class <br />
CS229: Machine Learning. Please see Lecture notes #1 from http://cs229.stanford.edu/ for details.]<br />
<br />
We thus need to compute the gradient:<br />
:<math>\begin{align}<br />
\nabla_\theta \ell(\theta) = \sum_{i=1}^m \left(y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}_j.<br />
\end{align}</math><br />
Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that<br />
<tt>x(i,:)'</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is <math>\textstyle x^{(i)}_j</math>.<br />
Further, suppose the Matlab/Octave variable <tt>y</tt> is a vector of the labels in the<br />
training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.<br />
<br />
Here's truly horrible, extremely slow, implementation:<br />
<syntaxhighlight lang="matlab"><br />
grad = zeros(n+1,1);<br />
for i=1:m,<br />
h = sigmoid(theta'*x(i,:)');<br />
temp = y(i) - h; <br />
for j=1:n+1,<br />
grad(j) = grad(j) + temp * x(i,j); <br />
end;<br />
end;<br />
</syntaxhighlight><br />
The two nested for-loops makes this very slow. Here's a more typical implementation,<br />
that partially vectorizes the algorithm and gets better performance: <br />
<syntaxhighlight lang="matlab"><br />
grad = zeros(n+1,1);<br />
for i=1:m,<br />
grad = grad + (y(i) - sigmoid(theta'*x(i,:)'))* x(i,:)';<br />
end; <br />
</syntaxhighlight><br />
<br />
However, it turns out to be possible to even further vectorize this. In Matlab/Octave,<br />
it is possible to get rid of for-loops, and doing so will speed up the algorithm. In<br />
particular, we can implement the following: <br />
<syntaxhighlight lang="matlab"><br />
grad = X' * (y- sigmoid(X*theta))<br />
</syntaxhighlight></div>Ang