Logistic Regression Vectorization Example

From Ufldl

Jump to: navigation, search
(Created page with "Consider training a logistic regression model using batch gradient ascent. Suppose our hypothesis is :<math>\begin{align} h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, \end{align}<...")
Line 18: Line 18:
\end{align}</math>
\end{align}</math>
Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that
Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that
-
<tt>x(i,:)'</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is  <math>\textstyle x^{(i)}_j</math>.
+
<tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is  <math>\textstyle x^{(i)}_j</math>.
-
Further, suppose the Matlab/Octave variable <tt>y</tt> is a vector of the labels in the
+
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
-
training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.
+
training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the
 +
CS229 notation, because in $<tt>x</tt> we stack the training inputs in columns rather than in rows;
 +
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row rather than a column vector.)
Here's truly horrible, extremely slow, implementation:
Here's truly horrible, extremely slow, implementation:
<syntaxhighlight lang="matlab">
<syntaxhighlight lang="matlab">
 +
% Implementation 1
grad = zeros(n+1,1);
grad = zeros(n+1,1);
for i=1:m,
for i=1:m,
-
   h = sigmoid(theta'*x(i,:)');
+
   h = sigmoid(theta'*x(:,i));
   temp = y(i) - h;  
   temp = y(i) - h;  
   for j=1:n+1,
   for j=1:n+1,
-
     grad(j) = grad(j) + temp * x(i,j);  
+
     grad(j) = grad(j) + temp * x(j,i);  
   end;
   end;
end;
end;
Line 36: Line 39:
that partially vectorizes the algorithm and gets better performance:  
that partially vectorizes the algorithm and gets better performance:  
<syntaxhighlight lang="matlab">
<syntaxhighlight lang="matlab">
 +
% Implementation 2
grad = zeros(n+1,1);
grad = zeros(n+1,1);
for i=1:m,
for i=1:m,
-
   grad = grad + (y(i) - sigmoid(theta'*x(i,:)'))* x(i,:)';
+
   grad = grad + (y(i) - sigmoid(theta'*x(:,i)))* x(:,i);
end;   
end;   
</syntaxhighlight>
</syntaxhighlight>
Line 46: Line 50:
particular, we can implement the following:  
particular, we can implement the following:  
<syntaxhighlight lang="matlab">
<syntaxhighlight lang="matlab">
-
grad = X' * (y- sigmoid(X*theta))
+
% Implementation 3
 +
grad = x * (y- sigmoid(theta'*x))'
</syntaxhighlight>
</syntaxhighlight>
 +
Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result.  The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt>
 +
 +
When the training set is large, this final implementation takes the greatest advantage of Matlab/Octave's highly optimized numerical linear algebra libraries to carry out the matrix-vector operations, and so this is far more efficient than the earlier implementations.

Revision as of 01:15, 27 February 2011

Personal tools