Derivatives of Activations to find Gradients

In an extension of my previous post on neural networks, I will be detailing the different derivatives generated for activation functions. These derivatives produce the necessary gradients during backwards propagation with which we may update our weights (parameters), as our errors are fed backwards through the different layers in a network during optimisation.

Derivatives of activation functions covered are those most popularly used.

Sigmoid

def gradient_sigmoid(x):
    return (1 / (1 + np.exp(-x))) * ( 1 - (1 / (1 + np.exp(-x))))

Rectified Linear Units (ReLU)

def gradient_relu(x):
    return 1 if x > 0 else 0

Leaky ReLU

An adjustment to the ReLU function given an epsilon small enough.

def gradient_leaky_relu(x, epsilon=0.01):
    return 1 if x > 0 else epsilon

Hyperbolic Tangent (Tanh)

This function is similarly degenerate, being a wrapper for an established NumPy function that calculates the element-wise hyperbolic tangent.

However, an alternate and differentiable expression does exist.

def gradient_tanh(x):
    return 1 - np.tanh(x)**2

def gradient_tanh(x):
    return 1 - (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))**2

Exponential Linear Unit (ELU)

def gradient_elu(x, epsilon=1.0):
    return 1 if x > 0 else epsilon * np.exp(x)

Swish

def gradient_swish(x):
	return (x * np.exp(-x) + 1 + np.exp(-x)) / (1 + np.exp(-x))**2

Activation Functions

Neural Network Implementation I