逻辑回归的向量化实现样例

From Ufldl

Jump to: navigation, search
(Created page with "逻辑回归的向量化实现样例")
Line 1: Line 1:
逻辑回归的向量化实现样例
逻辑回归的向量化实现样例
 +
 +
【原文】:
 +
 +
Consider training a logistic regression model using batch gradient ascent. Suppose our hypothesis is
 +
 +
【初译】:
 +
 +
试想,采用批量梯度上升法(batch gradient ascent)对逻辑回归模型进行求解(训练),模型如下:
 +
 +
【一校】:
 +
 +
我们想用批量梯度上升法对logistic回归分析模型进行训练,其模型如下:
 +
 +
<math>h_{\theta }\left( x\right) =\dfrac {1} {1+exp\left( -\theta ^{T}x\right) }</math>
 +
 +
【原文】:
 +
 +
where (following the notational convention from the OpenClassroom videos and from CS229) we let <math>x_{0}=1</math> , so that <math>x\in R^{n+1}</math> and  <math>\theta \in R^{n+1}</math>, and <math>\theta _{0}</math> is our intercept term. We have a training set{(<math>x^\left( 1\right) </math>,<math>y^\left( 1\right)</math> ) ,...,(<math>x^\left( m\right)</math> ,<math>y^\left( m\right)</math> ) } of m examples, and the batch gradient ascent update rule is <math>\theta :=\theta +\alpha \nabla _{\theta }l\left( \theta \right) </math> , where <math>l\left( \theta \right) </math>  is the log likelihood and  <math>\nabla _{\theta }l\left( \theta \right) </math> is its derivative.
 +
 +
【初译】:
 +
 +
设 <math>x_{0}=1</math>(符号规范遵从公开课程视频与CS229教学讲义),于是 <math>x\in R^{n+1}</math> , <math>\theta \in R^{n+1}</math>, <math>\theta _{0}</math>为截距。现在,我们有m组样本数据集 {(<math>x^\left( 1\right) </math>,<math>y^\left( 1\right)</math> ) ,...,(<math>x^\left( m\right)</math> ,<math>y^\left( m\right)</math> ) },而批量梯度上升法的规则是: <math>\theta :=\theta +\alpha \nabla _{\theta }l\left( \theta \right) </math> ,这里的<math>l\left( \theta \right) </math> 是对数似然函数,<math>\nabla _{\theta }l\left( \theta \right) </math> 它的导函数。
 +
 +
【一校】:
 +
 +
设 <math>x_{0}=1</math> (符号规范遵从公开课程视频与CS229教学讲义),于是<math>x\in R^{n+1}</math> ,<math>\theta \in R^{n+1}</math>, <math>\theta _{0} </math> 为截距。我们有m个训练样本{(<math>x^\left( 1\right) </math>,<math>y^\left( 1\right)</math> ) ,...,(<math>x^\left( m\right)</math> ,<math>y^\left( m\right)</math> ) },而批量梯度上升法的更新法则是: <math>\theta :=\theta +\alpha \nabla _{\theta }l\left( \theta \right) </math> ,这里的 <math>l\left( \theta \right) </math> 是对数似然函数,<math>\nabla _{\theta }l\left( \theta \right) </math> 是其导函数。
 +
 +
【原文】:
 +
 +
[Note: Most of the notation below follows that defined in the OpenClassroom videos or in the class CS229: Machine Learning. For details, see either the OpenClassroom videos or Lecture Notes #1 of http://cs229.stanford.edu/ .]
 +
 +
【初译】:
 +
 +
[注:下文的符号规范与<公开课程视频>或<教学讲义CS229:机器学习>中的相同,详细内容可以参见公开课程视频或教学讲义#1http://cs229.stanford.edu/]
 +
 +
【一校】:
 +
 +
[注:下文的符号规范与<公开课程视频>或<教学讲义CS229:机器学习>中的相同,详细内容可以参见公开课程视频或教学讲义#1http://cs229.stanford.edu/]
 +
 +
【原文】:
 +
 +
We thus need to compute the gradient:
 +
 +
【初译】:
 +
 +
于是,我们要计算一下梯度(gradient):
 +
 +
【一校】:
 +
 +
于是,我们要计算一下梯度(gradient):
 +
 +
<math>\nabla _{\theta }l\left( \theta \right) =\sum _{i=1}^{m}\left( y^{\left( i\right) }-h_{\theta }\left( x^{\left( i\right) }\right) \right) x_{j}^{\left( i\right) }</math>
 +
 +
【原文】:
 +
 +
Suppose that the Matlab/Octave variable x is a matrix containing the training inputs, so that x(:,i) is the i-th training example <math>x^{\left( i\right) }</math> , and x(j,i) is <math>x_{j}^{\left( i\right) }</math>.Further, suppose the Matlab/Octave variable y is a row vector of the labels in the training set, so that the variable y(i) is <math>y^{\left( 1\right) }\in \left\{ 0,1\right\} </math>. (Here we differ from the OpenClassroom/CS229 notation. Specifically, in the matrix-valued x we stack the training inputs in columns rather than in rows; and <math>y\in R^{1\times m}</math> is a row vector rather than a column vector.)
 +
 +
【初译】:
 +
 +
假设Matlab/Octave中的变量x代表输入数据的样本矩阵,因此,x(:,i)就是第i个样本向量 <math>x^{\left( i\right) }</math>,x(j,i)就代表<math>x_{j}^{\left( i\right) }</math>(译者注:第i个样本向量的第j行元素)。同样,假定Matlab/Octave中变量y代表这个样本数据集的一个行向量,因此,变量y(i)的取值就表示为<math>y^{\left( 1\right) }\in \left\{ 0,1\right\} </math>(这里跟公开课程视频及CS229的符号规范不同,我们将矩阵x的输入值放入列中而不是行中,同样,<math>y\in R^{1\times m}</math>是个行向量而不是列向量。)
 +
 +
【一校】:
 +
 +
我们用Matlab/Octave风格变量x表示输入数据构成的样本矩阵,x(:,i)代表第 i个训练样本<math>x^{\left( i\right) }</math>,x(j,i),x(j,i)就代表<math>x_{j}^{\left( i\right) }</math>(译者注:第i个训练样本向量的第j个元素)。同样,用Matlab/Octave风格变量y表示由训练样本集合的全体标号所构成的向量, 则变量y(i) 就代表该向量中的每个元素,即上式中的<math>y^{\left( 1\right) }\in \left\{ 0,1\right\} </math>(注意这里跟公开课程视频及CS229的符号规范不同,矩阵x按列而不是按行存放输入训练样本,同样,<math>y\in R^{1\times m}</math>是行向量而不是列向量。)
 +
 +
【原文】:
 +
 +
Here's truly horrible, extremely slow, implementation of the gradient computation:
 +
 +
【初译】:
 +
 +
以下是执行梯度运算的代码,非常恐怖,速度极慢:
 +
 +
【一校】:
 +
 +
以下是执行梯度运算的代码,非常恐怖,速度极慢:
 +
 +
  % Implementation 1
 +
  % 代码1
 +
  grad = zeros(n+1,1);
 +
  for i=1:m,
 +
  h = sigmoid(theta'*x(:,i));
 +
  temp = y(i) - h;
 +
  for j=1:n+1,
 +
    grad(j) = grad(j) + temp * x(j,i);
 +
  end;
 +
  end;
 +
 +
【原文】:
 +
 +
The two nested for-loops makes this very slow. Here's a more typical implementation,that partially vectorizes the algorithm and gets better performance:
 +
 +
【初译】:
 +
 +
嵌套的for循环语句使这段代码的运行非常缓慢。以下是更具代表性的代码,它对算法进行部分向量化,带来更优的执行效率:
 +
 +
【一校】:
 +
 +
嵌套的for循环语句使这段代码的运行非常缓慢。以下是更典型的实现方式代码,它对部分向量化了算法,带来更优的执行效率:
 +
 +
  % Implementation 2
 +
  % 代码 2
 +
  grad = zeros(n+1,1);
 +
    for i=1:m,
 +
      grad = grad + (y(i) - sigmoid(theta'*x(:,i)))* x(:,i);
 +
  end;
 +
 +
【原文】:
 +
 +
However, it turns out to be possible to even further vectorize this. If we can get rid of the for-loop, we can significantly speed up the implementation. In particular, suppose b is a column vector, and A is a matrix. Consider the following ways of computing A * b:
 +
 +
【初译】:
 +
 +
但是,或许可以向量化得更彻底些。如果去除for循环,我们就可以显著地改善代码执行效率。打个比方,b是一个列向量,A是一个矩阵,试着用以下两种方式来计算A*b:
 +
 +
【一校】:
 +
 +
但是,或许可以向量化得更彻底些。如果去除for循环,我们就可以显著地改善代码执行效率。更明确一点地说,假定b是一个列向量,A是一个矩阵,我们用以下两种方式来计算A*b:
 +
 +
  % Slow implementation of matrix-vector multiply
 +
  % 矩阵-向量乘法运算的低效代码
 +
  grad = zeros(n+1,1);
 +
  for i=1:m,
 +
    grad = grad + b(i) * A(:,i);  % more commonly written A(:,i)*b(i)
 +
                                  % 通常写法为A(:,i)*b(i)
 +
  end;
 +
 +
  % Fast implementation of matrix-vector multiply
 +
  % 矩阵-向量乘法运算的高效代码
 +
    grad = A*b;
 +
 +
【原文】:
 +
 +
We recognize that Implementation 2 of our gradient descent calculation above is using the slow version with a for-loop, with b(i) playing the role of (y(i) - sigmoid(theta'*x(:,i))), and A playing the role of x. We can derive a fast implementation as follows:
 +
 +
【初译】:
 +
 +
我们看到,代码2是用了低效的for循环语句执行梯度下降运算,将b(i)看成(y(i) - sigmoid(theta'*x(:,i))),A看成x,我们就可以使用以下高效率的代码:
 +
 +
【一校】:
 +
 +
我们看到,代码2是用了低效的for循环语句执行梯度上升(原文是下降)运算,将b(i)看成(y(i) - sigmoid(theta'*x(:,i))),A看成x,我们就可以使用以下高效率的代码:
 +
 +
  % Implementation 3
 +
  % 代码 3
 +
  grad = x * (y- sigmoid(theta'*x));
 +
 +
【原文】:
 +
 +
Here, we assume that the Matlab/Octave sigmoid(z) takes as input a vector z, applies the sigmoid function component-wise to the input, and returns the result. The output of sigmoid(z) is therefore itself also a vector, of the same dimension as the input z
 +
 +
【初译】:
 +
 +
这里我们假设在Matlab/Octave中,sigmoid(z)函数是以向量z为输入参数,并返回运算结果,sigmoid(z)的输出结果是一个与z有相同维度的向量。
 +
 +
【一校】:
 +
 +
这里我们假定Matlab/Octave函数sigmoid(z)函数实现的功能是,它接受一个向量形式的输入z,依次对输入向量的每个元素施行sigmoid函数,最后返回运算结果,因此sigmoid(z)的输出结果是一个与z有相同维度的向量。
 +
 +
【原文】:
 +
 +
When the training set is large, this final implementation takes the greatest advantage of Matlab/Octave's highly optimized numerical linear algebra libraries to carry out the matrix-vector operations, and so this is far more efficient than the earlier implementations.
 +
 +
【初译】:
 +
 +
当样本数据集很大时,最终的代码(译者注:代码3)充分发挥了Matlab/Octave运行库的优点,这些运行库高度优化了线性代数的数值运算,所以最后的代码(代码3)比起之前代码效率上要高得多得多。
 +
 +
【一校】:
 +
 +
当样本数据集很大时,最终的代码(译者注:代码3)充分发挥了Matlab/Octave被高度优化的数值线性代数库的优势,最后的代码(代码3)比起之前代码效率上要高得多得多。
 +
 +
【原文】:
 +
 +
Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought. But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code.
 +
 +
【初译】:
 +
 +
想采用向量化运算并非易事,通常需要细致考究。但当你熟练掌握向量化运算后,你会发现,通过这些设计模式(也就是说:略施向量化小技巧)可以应用于很多不同的代码段。
 +
 +
【一校】:
 +
 +
想采用向量化运算并非易事,通常需要细致考究。但当你熟练掌握向量化运算后,你会发现,这些并不太多的向量化技巧其实代表了一些设计模式,可以灵活运用到很多不同的代码片段中。

Revision as of 16:31, 7 March 2013

Personal tools