Activation Functions

In an extension of my previous post on neural networks, I will be detailing the different activation functions used to transform the dot product of our input matrix and weights (parameters), as observations are fed forward through the different layers in a network during training.

Activation functions covered are those most popularly used.

Sigmoid

Non-linear relationship, but requires additive computational burden between layers.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Rectified Linear Units (ReLU)

Linear relationship, requiring lower computational burden between layers.

def relu(x):
    return x if x > 0 else 0

Leaky ReLU

An adjustment to the ReLU function given an epsilon small enough.

def leaky_relu(x, epsilon=0.01):
    return x if x > 0 else x * epsilon

Hyperbolic Tangent (Tanh)

This function is perfectly degenerate, being a wrapper for an established NumPy function that calculates the element-wise hyperbolic tangent (don’t @ me).

However, an alternate and differentiable expression does exist.

def tanh(x):
    return np.tanh(x)

def tanh(x):
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))

Exponential Linear Unit (ELU)

Monotone exponential relationship for negative values that are scaled downwards.

def elu(x, epsilon=1.0):
    return x if x > 0 else epsilon * (np.exp(x) - 1)

Swish

A combined transformation dependent on the sigmoid function.

def swish(x):
    return x * sigmoid(x)

Softmax

This activation function is critical for multi-class classification models.

def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / np.sum(exp_x)

Evaluating Predictive Models

Derivatives of Activations to find Gradients