Logistic Regression Vectorization Example

From Ufldl

Jump to: navigation, search
Line 4: Line 4:
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
\end{align}</math>
\end{align}</math>
-
where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>
+
where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>  
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
Line 17: Line 17:
\nabla_\theta \ell(\theta) = \sum_{i=1}^m \left(y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}_j.
\nabla_\theta \ell(\theta) = \sum_{i=1}^m \left(y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}_j.
\end{align}</math>
\end{align}</math>
 +
Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that
Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that
-
<tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is  <math>\textstyle x^{(i)}_j</math>.   
+
<tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math>, and <tt>x(j,i)</tt> is  <math>\textstyle x^{(i)}_j</math>.   
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
-
training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the  
+
training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the  
-
CS229 notation, because in $<tt>x</tt> we stack the training inputs in columns rather than in rows;
+
CS229 notation; specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
-
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row rather than a column vector.)  
+
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.)  
-
Here's truly horrible, extremely slow, implementation:
+
Here's truly horrible, extremely slow, implementation of the gradient computation:
<syntaxhighlight lang="matlab">
<syntaxhighlight lang="matlab">
% Implementation 1
% Implementation 1

Revision as of 19:35, 1 March 2011

Personal tools