Softmax回归

Revision as of 06:08, 16 March 2013 (view source)

Kandeng (Talk | contribs)

(→softmax回归模型参数化的特点)

← Older edit

Revision as of 06:13, 16 March 2013 (view source)

Kandeng (Talk | contribs)

(→权重衰减 Weight Decay)

Newer edit →

Line 138:

在实际应用中，为了使算法实现更简单清楚，往往保留所有参数<math>\textstyle (\theta_1, \theta_2,\ldots, \theta_n)</math>，而不任意地将某一参数设置为0。但此时我们需要对代价函数做一个改动：加入权重衰减。权重衰减可以解决 softmax 回归的参数冗余所带来的数值问题。

-

==权重衰减 ~~Weight Decay~~ ==

+

==权重衰减==

-

~~'''原文''':~~

+

我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math> 来修改代价函数，这个衰减项会惩罚过大的参数值，现在我们的代价函数变为：

-

+

-

~~We will modify the cost function by adding a weight decay term~~

+

-

<math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>

+

-

~~which penalizes large values of the parameters. Our cost function is now~~

+

<math>

Line 154:

Line 150:

-

~~'''译文''':~~

+

有了这个权重衰减项以后 (<math>\textstyle \lambda > 0</math>)，代价函数就变成了严格的凸函数，这样就可以保证得到唯一的解了。此时的 Hessian矩阵变为可逆矩阵，并且因为<math>\textstyle J(\theta)</math>是凸函数，梯度下降法和 L-BFGS 等算法可以保证收敛到全局最优解。

-

我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>来修改损失函数，这个衰减项会惩罚过大的参数值，现在我们的损失函数变成：

-

~~<math>~~

+

为了使用优化算法，我们需要求得这个新函数<math>\textstyle J(\theta)</math>的导数，如下：

-

~~\begin{align}~~

+

-

~~J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right]~~

+

-

~~+ \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2~~

+

-

~~\end{align}~~

+

-

~~</math>~~

+

-

+

-

~~'''一审''':~~

+

-

~~我们通过添加一个权重衰减项~~ <math>\textstyle ~~\frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>来修改代价函数，这个衰减项会惩罚过大的参数值，现在我们的代价函数变为：~~

+

-

+

-

~~<math>~~

+

-

~~\begin{align}~~

+

-

~~J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right]~~

+

-

~~+ \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2~~

+

-

~~\end{align}~~

+

-

~~</math>~~

+

-

+

-

+

-

~~'''原文''':~~

+

-

+

-

~~With this weight decay term (for any <math>\lambda > 0</math>), the cost function~~

+

-

~~<math>J(\theta)</math> is now strictly convex, and is guaranteed to have a~~

+

-

~~unique solution. The Hessian is now invertible, and because <math>J(\theta)</math> is~~

+

-

~~convex, algorithms such as gradient descent, L-BFGS, etc. are guaranteed~~

+

-

~~to converge to the global minimum.~~

+

-

+

-

+

-

~~'''译文''':~~

+

-

( 对于任意的<math>\lambda > 0</math>) ，有了这个权重衰减项以后，损失函数就变成了严格的凸函数，可以保证解唯一了。此时的 Hessian 矩阵不再可逆，因为<math>J(\theta)</math>是凸的，梯度下降和 L-BFGS 之类的算法可以保证收敛到全局最优解。

+

-

+

-

~~'''一审''':~~

+

-

有了这个权重衰减项以后 (对于任意的<math>\lambda > 0</math>)，代价函数就变成了严格的凸函数，这样就可以保证得到唯一的解了。此时的 Hessian矩阵变为可逆矩阵，并且因为<math>J(\theta)</math>是凸函数，梯度下降法和 L-BFGS 等算法可以保证收敛到全局最优解。

+

-

+

-

~~'''原文''':~~

+

-

+

-

~~To apply an optimization algorithm, we also need the derivative of this~~

+

-

~~new definition of <math>~~J(\theta)</math>~~. One can show that the derivative is:~~

+

<math>

\begin{align}

Line 202:

Line 161:

-

~~'''译文''':~~

+

通过最小化<math>\textstyle J(\theta)</math>，我们就能实现一个可用的softmax回归模型。

-

~~为了使用优化算法，我们需要求得这个新~~<math>~~J(\theta)</math>.函数的导数形式，如下：~~

+

-

~~<math>~~

+

-

~~\begin{align}~~

+

-

~~\nabla_{~~\~~theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j~~

+

-

~~\end{align}~~

+

-

~~</math>~~

+

-

+

-

+

-

~~'''一审''':~~

+

-

~~为了使用优化算法，我们需要求得这个新定义的<math>J(\theta)</math>。函数的导数公式，我们可以得到导数公式如下：~~

+

-

+

-

~~<math>~~

+

-

~~\begin{align}~~

+

-

~~\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j~~

+

-

~~\end{align}~~

+

-

~~</math>~~

+

-

+

-

~~'''原文''':~~

+

-

+

-

~~By minimizing <math>J(\theta)</math> with respect to <math>\theta</math>, we will have a working implementation of softmax regression.~~

+

-

+

-

+

-

~~'''译文''':~~

+

-

~~通过最小化<math>~~J(\theta)</math> ，我们就能实现一个可用的softmax回归模型。

+

-

+

-

+

-

~~'''一审''':~~

+

-

~~通过对参数 <math>\theta</math>进行函数<math>J(\theta)</math> 的最小化求解，我们就得到了一个可用的 softmax 回归的实现。~~

+

==Softmax回归与Logistic 回归的关系 Relationship to Logistic Regression ==

Softmax回归

From Ufldl

Revision as of 06:13, 16 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 138: / Line 138: @@
 在实际应用中，为了使算法实现更简单清楚，往往保留所有参数<math>\textstyle (\theta_1, \theta_2,\ldots, \theta_n)</math>，而不任意地将某一参数设置为0。但此时我们需要对代价函数做一个改动：加入权重衰减。权重衰减可以解决 softmax 回归的参数冗余所带来的数值问题。
-==权重衰减  Weight Decay ==
+==权重衰减==
-'''原文''':
+我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math> 来修改代价函数，这个衰减项会惩罚过大的参数值，现在我们的代价函数变为：
-We will modify the cost function by adding a weight decay term
-<math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>
-which penalizes large values of the parameters.  Our cost function is now
 <math>
@@ Line 154: / Line 150: @@
-'''译文''':
+有了这个权重衰减项以后 (<math>\textstyle \lambda > 0</math>)，代价函数就变成了严格的凸函数，这样就可以保证得到唯一的解了。 此时的 Hessian矩阵变为可逆矩阵，并且因为<math>\textstyle J(\theta)</math>是凸函数，梯度下降法和 L-BFGS 等算法可以保证收敛到全局最优解。
-我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>来修改损失函数，这个衰减项会惩罚过大的参数值，现在我们的损失函数变成：
-<math>
+为了使用优化算法，我们需要求得这个新函数<math>\textstyle J(\theta)</math>的导数，如下：
-\begin{align}
-J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}  \right]
-              + \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2
-\end{align}
-</math>
-'''一审''':
-我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>来修改代价函数，这个衰减项会惩罚过大的参数值，现在我们的代价函数变为：
-<math>
-\begin{align}
-J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}  \right]
-              + \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2
-\end{align}
-</math>
-'''原文''':
-With this weight decay term (for any <math>\lambda > 0</math>), the cost function
-<math>J(\theta)</math> is now strictly convex, and is guaranteed to have a
-unique solution.  The Hessian is now invertible, and because <math>J(\theta)</math> is
-convex, algorithms such as gradient descent, L-BFGS, etc. are guaranteed
-to converge to the global minimum.
-'''译文''':
-( 对于任意的<math>\lambda > 0</math>) ，有了这个权重衰减项以后，损失函数就变成了严格的凸函数，可以保证解唯一了。此时的 Hessian 矩阵不再可逆，因为<math>J(\theta)</math>是凸的，梯度下降和 L-BFGS 之类的算法可以保证收敛到全局最优解。
-'''一审''':
-有了这个权重衰减项以后 (对于任意的<math>\lambda > 0</math>)，代价函数就变成了严格的凸函数，这样就可以保证得到唯一的解了。 此时的 Hessian矩阵 变为可逆矩阵 ， 并且因为<math>J(\theta)</math>是凸函数 ，梯度下降法和 L-BFGS 等算法可以保证收敛到全局最优解。
-'''原文''':
-To apply an optimization algorithm, we also need the derivative of this
-new definition of <math>J(\theta)</math>.  One can show that the derivative is:
 <math>
 \begin{align}
@@ Line 202: / Line 161: @@
-'''译文''':
+通过最小化<math>\textstyle J(\theta)</math>，我们就能实现一个可用的softmax回归模型。
-为了使用优化算法，我们需要求得这个新<math>J(\theta)</math>.函数的导数形式，如下：
-<math>
-\begin{align}
-\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
-\end{align}
-</math>
-'''一审''':
-为了使用优化算法，我们需要求得这个新定义的<math>J(\theta)</math>。函数的导数公式，我们可以得到导数公式如下：
-<math>
-\begin{align}
-\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
-\end{align}
-</math>
-'''原文''':
-By minimizing <math>J(\theta)</math> with respect to <math>\theta</math>, we will have a working implementation of softmax regression.
-'''译文''':
-通过最小化<math>J(\theta)</math> ，我们就能实现一个可用的softmax回归模型。
-'''一审''':
-通过对参数 <math>\theta</math>进行函数<math>J(\theta)</math> 的最小化求解，我们就得到了一个可用的 softmax 回归的实现。
 ==Softmax回归与Logistic 回归的关系 Relationship to Logistic Regression ==