Logistic Regression Vectorization Example
From Ufldl
(Created page with "Consider training a logistic regression model using batch gradient ascent. Suppose our hypothesis is :<math>\begin{align} h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, \end{align}<...") |
|||
Line 18: | Line 18: | ||
\end{align}</math> | \end{align}</math> | ||
Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that | Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that | ||
- | <tt>x(i | + | <tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is <math>\textstyle x^{(i)}_j</math>. |
- | Further, suppose the Matlab/Octave variable <tt>y</tt> is a vector of the labels in the | + | Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the |
- | training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. | + | training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the |
+ | CS229 notation, because in $<tt>x</tt> we stack the training inputs in columns rather than in rows; | ||
+ | and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row rather than a column vector.) | ||
Here's truly horrible, extremely slow, implementation: | Here's truly horrible, extremely slow, implementation: | ||
<syntaxhighlight lang="matlab"> | <syntaxhighlight lang="matlab"> | ||
+ | % Implementation 1 | ||
grad = zeros(n+1,1); | grad = zeros(n+1,1); | ||
for i=1:m, | for i=1:m, | ||
- | h = sigmoid(theta'*x( | + | h = sigmoid(theta'*x(:,i)); |
temp = y(i) - h; | temp = y(i) - h; | ||
for j=1:n+1, | for j=1:n+1, | ||
- | grad(j) = grad(j) + temp * x( | + | grad(j) = grad(j) + temp * x(j,i); |
end; | end; | ||
end; | end; | ||
Line 36: | Line 39: | ||
that partially vectorizes the algorithm and gets better performance: | that partially vectorizes the algorithm and gets better performance: | ||
<syntaxhighlight lang="matlab"> | <syntaxhighlight lang="matlab"> | ||
+ | % Implementation 2 | ||
grad = zeros(n+1,1); | grad = zeros(n+1,1); | ||
for i=1:m, | for i=1:m, | ||
- | grad = grad + (y(i) - sigmoid(theta'*x( | + | grad = grad + (y(i) - sigmoid(theta'*x(:,i)))* x(:,i); |
end; | end; | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 46: | Line 50: | ||
particular, we can implement the following: | particular, we can implement the following: | ||
<syntaxhighlight lang="matlab"> | <syntaxhighlight lang="matlab"> | ||
- | grad = | + | % Implementation 3 |
+ | grad = x * (y- sigmoid(theta'*x))' | ||
</syntaxhighlight> | </syntaxhighlight> | ||
+ | Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result. The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt> | ||
+ | |||
+ | When the training set is large, this final implementation takes the greatest advantage of Matlab/Octave's highly optimized numerical linear algebra libraries to carry out the matrix-vector operations, and so this is far more efficient than the earlier implementations. |