Activations¶
Popular (and some notsopopular) activation functions for use within arbitrary neural networks.
Affine
¶

class
numpy_ml.neural_nets.activations.
Affine
(slope=1, intercept=0)[source]¶ An affine activation function.
Parameters: 
fn
(z)[source]¶ Evaluate the Affine activation on the elements of input z.
\[\text{Affine}(z_i) = \text{slope} \times z_i + \text{intercept}\]

ELU
¶

class
numpy_ml.neural_nets.activations.
ELU
(alpha=1.0)[source]¶ An exponential linear unit (ELU).
Notes
ELUs are intended to address the fact that ReLUs are strictly nonnegative and thus have an average activation > 0, increasing the chances of internal covariate shift and slowing down learning. ELU units address this by (1) allowing negative values when \(x < 0\), which (2) are bounded by a value \(\alpha\). Similar to
LeakyReLU
, the negative activation values help to push the average unit activation towards 0. UnlikeLeakyReLU
, however, the boundedness of the negative activation allows for greater robustness in the face of large negative values, allowing the function to avoid conveying the degree of “absence” (negative activation) in the input. [*]Parameters: alpha (float) – Slope of negative segment. Default is 1. References
[*] Clevert, D. A., Unterthiner, T., Hochreiter, S. (2016). “Fast and accurate deep network learning by exponential linear units (ELUs)”. 4th International Conference on Learning Representations. 
fn
(z)[source]¶ Evaluate the ELU activation on the elements of input z.
\[\begin{split}\text{ELU}(z_i) &= z_i \ \ \ \ &&\text{if }z_i > 0 \\ &= \alpha (e^{z_i}  1) \ \ \ \ &&\text{otherwise}\end{split}\]

Exponential
¶
HardSigmoid
¶

class
numpy_ml.neural_nets.activations.
HardSigmoid
[source]¶ A “hard” sigmoid activation function.
Notes
The hard sigmoid is a piecewise linear approximation of the logistic sigmoid that is computationally more efficient to compute.

fn
(z)[source]¶ Evaluate the hard sigmoid activation on the elements of input z.
\[\begin{split}\text{HardSigmoid}(z_i) &= 0 \ \ \ \ &&\text{if }z_i < 2.5 \\ &= 0.2 z_i + 0.5 \ \ \ \ &&\text{if }2.5 \leq z_i \leq 2.5 \\ &= 1 \ \ \ \ &&\text{if }z_i > 2.5\end{split}\]

Identity
¶

class
numpy_ml.neural_nets.activations.
Identity
[source]¶ Identity activation function.
Notes
Identity
is just syntactic sugar forAffine
with slope = 1 and intercept = 0.
fn
(z)[source]¶ Evaluate the Affine activation on the elements of input z.
\[\text{Affine}(z_i) = \text{slope} \times z_i + \text{intercept}\]

LeakyReLU
¶

class
numpy_ml.neural_nets.activations.
LeakyReLU
(alpha=0.3)[source]¶ ‘Leaky’ version of a rectified linear unit (ReLU).
Notes
Leaky ReLUs [†] are designed to address the vanishing gradient problem in ReLUs by allowing a small nonzero gradient when x is negative.
Parameters: alpha (float) – Activation slope when x < 0. Default is 0.3. References
[†] Mass, L. M., Hannun, A. Y, & Ng, A. Y. (2013). “Rectifier nonlinearities improve neural network acoustic models”. Proceedings of the 30th International Conference of Machine Learning, 30. 
fn
(z)[source]¶ Evaluate the leaky ReLU function on the elements of input z.
\[\begin{split}\text{LeakyReLU}(z_i) &= z_i \ \ \ \ &&\text{if } z_i > 0 \\ &= \alpha z_i \ \ \ \ &&\text{otherwise}\end{split}\]

ReLU
¶

class
numpy_ml.neural_nets.activations.
ReLU
[source]¶ A rectified linear activation function.
Notes
“ReLU units can be fragile during training and can “die”. For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold.
For example, you may find that as much as 40% of your network can be “dead” (i.e. neurons that never activate across the entire training dataset) if the learning rate is set too high. With a proper setting of the learning rate this is less frequently an issue.” [‡]
References
[‡] Karpathy, A. “CS231n: Convolutional neural networks for visual recognition”. 
fn
(z)[source]¶ Evaulate the ReLU function on the elements of input z.
\[\begin{split}\text{ReLU}(z_i) &= z_i \ \ \ \ &&\text{if }z_i > 0 \\ &= 0 \ \ \ \ &&\text{otherwise}\end{split}\]

SELU
¶

class
numpy_ml.neural_nets.activations.
SELU
[source]¶ A scaled exponential linear unit (SELU).
Notes
SELU units, when used in conjunction with proper weight initialization and regularization techniques, encourage neuron activations to converge to zeromean and unit variance without explicit use of e.g., batchnorm.
For SELU units, the \(\alpha\) and \(\text{scale}\) values are constants chosen so that the mean and variance of the inputs are preserved between consecutive layers. As such the authors propose weights be initialized using LecunNormal initialization: \(w_{ij} \sim \mathcal{N}(0, 1 / \text{fan_in})\), and to use the dropout variant \(\alpha\)dropout during regularization. [§]
See the reference for more information (especially the appendix ;) ).
References
[§] Klambauer, G., Unterthiner, T., & Hochreiter, S. (2017). “Selfnormalizing neural networks.” Advances in Neural Information Processing Systems, 30. 
fn
(z)[source]¶ Evaluate the SELU activation on the elements of input z.
\[\text{SELU}(z_i) = \text{scale} \times \text{ELU}(z_i, \alpha)\]which is simply
\[\begin{split}\text{SELU}(z_i) &= \text{scale} \times z_i \ \ \ \ &&\text{if }z_i > 0 \\ &= \text{scale} \times \alpha (e^{z_i}  1) \ \ \ \ &&\text{otherwise}\end{split}\]

Sigmoid
¶

class
numpy_ml.neural_nets.activations.
Sigmoid
[source]¶ A logistic sigmoid activation function.

fn
(z)[source]¶ Evaluate the logistic sigmoid, \(\sigma\), on the elements of input z.
\[\sigma(x_i) = \frac{1}{1 + e^{x_i}}\]

SoftPlus
¶

class
numpy_ml.neural_nets.activations.
SoftPlus
[source]¶ A softplus activation function.
Notes
In contrast to
ReLU
, the softplus activation is differentiable everywhere (including 0). It is, however, less computationally efficient to compute.The derivative of the softplus activation is the logistic sigmoid.

fn
(z)[source]¶ Evaluate the softplus activation on the elements of input z.
\[\text{SoftPlus}(z_i) = \log(1 + e^{z_i})\]
