稀疏自编码器符号一览表
From Ufldl
(Created page with "稀疏自编码器符号一览表") |
|||
Line 1: | Line 1: | ||
- | + | Here is a summary of the symbols used in our derivation of the sparse autoencoder: | |
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Symbol | ||
+ | 符号 | ||
+ | ! Meaning | ||
+ | 含义 | ||
+ | |- | ||
+ | | <math>\textstyle x</math> | ||
+ | | Input features for a training example, <math>\textstyle x \in \Re^{n}</math>. | ||
+ | |- | ||
+ | | <math>\textstyle y</math> | ||
+ | | Output/target values. Here, <math>\textstyle y</math> can be vector valued. In the case of an autoencoder, <math>\textstyle y=x</math>. | ||
+ | |- | ||
+ | | <math>\textstyle (x^{(i)}, y^{(i)})</math> | ||
+ | | The <math>\textstyle i</math>-th training example | ||
+ | |- | ||
+ | | <math>\textstyle h_{W,b}(x)</math> | ||
+ | | Output of our hypothesis on input <math>\textstyle x</math>, using parameters <math>\textstyle W,b</math>. This should be a vector of | ||
+ | the same dimension as the target value <math>\textstyle y</math>. | ||
+ | |- | ||
+ | | <math>\textstyle W^{(l)}_{ij}</math> | ||
+ | | The parameter associated with the connection between unit <math>\textstyle j</math> in layer <math>\textstyle l</math>, and | ||
+ | unit <math>\textstyle i</math> in layer <math>\textstyle l+1</math>. | ||
+ | |- | ||
+ | | <math>\textstyle b^{(l)}_{i}</math> | ||
+ | | The bias term associated with unit <math>\textstyle i</math> in layer <math>\textstyle l+1</math>. Can also be thought of as the parameter associated with the connection between the bias unit in layer <math>\textstyle l</math> and unit <math>\textstyle i</math> in layer <math>\textstyle l+1</math>. | ||
+ | |- | ||
+ | | <math>\textstyle \theta</math> | ||
+ | | Our parameter vector. It is useful to think of this as the result of taking the parameters <math>\textstyle W,b</math> and ``unrolling'' them into a long column vector. | ||
+ | |- | ||
+ | | <math>\textstyle a^{(l)}_i</math> | ||
+ | | Activation (output) of unit <math>\textstyle i</math> in layer <math>\textstyle l</math> of the network. | ||
+ | In addition, since layer <math>\textstyle L_1</math> is the input layer, we also have <math>\textstyle a^{(1)}_i = x_i</math>. | ||
+ | |- | ||
+ | | <math>\textstyle f(\cdot)</math> | ||
+ | | The activation function. Throughout these notes, we used <math>\textstyle f(z) = \tanh(z)</math>. | ||
+ | |- | ||
+ | | <math>\textstyle z^{(l)}_i</math> | ||
+ | | Total weighted sum of inputs to unit <math>\textstyle i</math> in layer <math>\textstyle l</math>. Thus, <math>\textstyle a^{(l)}_i = f(z^{(l)}_i)</math>. | ||
+ | |- | ||
+ | | <math>\textstyle \alpha</math> | ||
+ | | Learning rate parameter | ||
+ | |- | ||
+ | | <math>\textstyle s_l</math> | ||
+ | | Number of units in layer <math>\textstyle l</math> (not counting the bias unit). | ||
+ | |- | ||
+ | | <math>\textstyle n_l</math> | ||
+ | | Number layers in the network. Layer <math>\textstyle L_1</math> is usually the input layer, and layer <math>\textstyle L_{n_l}</math> the output layer. | ||
+ | |- | ||
+ | | <math>\textstyle \lambda</math> | ||
+ | | Weight decay parameter. | ||
+ | |- | ||
+ | | <math>\textstyle \hat{x}</math> | ||
+ | | For an autoencoder, its output; i.e., its reconstruction of the input <math>\textstyle x</math>. Same meaning as <math>\textstyle h_{W,b}(x)</math>. | ||
+ | |- | ||
+ | | <math>\textstyle \rho</math> | ||
+ | | Sparsity parameter, which specifies our desired level of sparsity | ||
+ | |- | ||
+ | | <math>\textstyle \hat\rho_i</math> | ||
+ | | The average activation of hidden unit <math>\textstyle i</math> (in the sparse autoencoder). | ||
+ | |- | ||
+ | | <math>\textstyle \beta</math> | ||
+ | | Weight of the sparsity penalty term (in the sparse autoencoder objective). | ||
+ | |} | ||
+ | |||
+ | |||
+ | {{Sparse_Autoencoder}} |