Logistic Regression Vectorization Example

From Ufldl

Jump to: navigation, search
m (Added semicolons, deleted stray quote)
Line 4: Line 4:
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
\end{align}</math>
\end{align}</math>
-
where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>  
+
where (following the notational convention from the OpenClassroom videos and from CS229) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>  
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
Line 10: Line 10:
is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.
is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.
-
[Note: Most of the notation below follows that defined in the class  
+
[Note: Most of the notation below follows that defined in the OpenClassroom videos or in the class  
-
CS229: Machine Learning.  Please see Lecture notes #1 from http://cs229.stanford.edu/ for details.]
+
CS229: Machine Learning.  For details, see either the [http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning OpenClassroom videos] or Lecture Notes #1 of http://cs229.stanford.edu/ .]
We thus need to compute the gradient:
We thus need to compute the gradient:
Line 22: Line 22:
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the  
training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the  
-
CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
+
OpenClassroom/CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.)  
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.)  
Line 55: Line 55:
end;
end;
-
% Fast implementation of matrix-vector multiple
+
% Fast implementation of matrix-vector multiply
grad = A*b;
grad = A*b;
</syntaxhighlight>
</syntaxhighlight>

Revision as of 18:32, 29 April 2011

Personal tools