Understanding the Tanh Activation Function in Python

Filed Under: Python Advanced
Tanh Activation Function

Hello readers! In the last article, we looked briefly at the sigmoid activation function. In this article, we’ll be looking at the Tanh Activation Function in Python, in regards to Neural Networks.

Let’s get started!

The Tanh Activation Function

We often use activation functions when we want to “turn on” specific layers depending on the input, in terms of a mathematical function.

Tanh is one such function, which is very popular in Machine Learning literature, since it is a continuous and differential function.

The tanh function is of the below form, across the Real Number space:

f(x) = tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)

This function can have values ranging from (-1, 1), making the output normalized with respect to the input. Due to the above properties, tanh is a very good choice for backpropagation.

To get a visual understanding, here is the graph of Tanh(x):

Tanh Graph
Tanh Graph

The graph is very similar to the sigmoid activation function (S-shaped), which is another popular choice.

Here, if you can observe from the graph, tanh can correlate inputs → outputs very well. Strongly positive inputs are normalized and mapped closer to 1, while strongly negative inputs are mapped close to -1.

This makes it a very suitable choice for performing binary classification.

A simple implementation of the Tanh Activation Function in Python

Let’s quickly go through a sample tanh function in Python, using numpy and matplotlib.

import numpy as np
import matplotlib.pyplot as plt

def tanh(x):
    return np.tanh(x) # We can use numpy's builtin tanh

def generate_sample_data(start, end, step):
    # Generates sample data using np.linspace
    return np.linspace(start, end, step)

x = generate_sample_data(-5, 5, 10)
y = tanh(x)

# Now plot
plt.plot(x, y)


Tanh Plot
Tanh Activation Function – Plot

As you can see, the curve does resemble the original graph closely, even for this small dataset!

Limitations of tanh Activation Function

While the tanh has a lot of good properties for building classifier networks, one must always be careful when using it.

This is still a non linear activation function, which means that it can be prone to the vanishing gradient problem, when training on a large number of epochs.

The vanishing gradient problem is a situation where the derivatives become 0 (vanish) even for a large change in the input.

This becomes a problem when you’re dealing with a large number of layers on your Network, so one must always be careful about using these functions.


In this article, we learned about understanding the tanh activation function in Machine Learning.


Leave a Reply

Your email address will not be published. Required fields are marked *

Generic selectors
Exact matches only
Search in title
Search in content