Hello readers! In the last article, we looked briefly at the sigmoid activation function. In this article, we’ll be looking at the Tanh Activation Function in Python, in regards to Neural Networks.
Let’s get started!
Table of Contents
The Tanh Activation Function
We often use activation functions when we want to “turn on” specific layers depending on the input, in terms of a mathematical function.
Tanh is one such function, which is very popular in Machine Learning literature, since it is a continuous and differential function.
The tanh function is of the below form, across the Real Number space:
f(x) = tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
This function can have values ranging from (-1, 1), making the output normalized with respect to the input. Due to the above properties, tanh is a very good choice for backpropagation.
To get a visual understanding, here is the graph of Tanh(x):
The graph is very similar to the sigmoid activation function (S-shaped), which is another popular choice.
Here, if you can observe from the graph, tanh can correlate inputs → outputs very well. Strongly positive inputs are normalized and mapped closer to 1, while strongly negative inputs are mapped close to -1.
This makes it a very suitable choice for performing binary classification.
A simple implementation of the Tanh Activation Function in Python
import numpy as np import matplotlib.pyplot as plt def tanh(x): return np.tanh(x) # We can use numpy's builtin tanh def generate_sample_data(start, end, step): # Generates sample data using np.linspace return np.linspace(start, end, step) x = generate_sample_data(-5, 5, 10) y = tanh(x) # Now plot plt.xlabel("x") plt.ylabel("tanh(x)") plt.plot(x, y) plt.show()
As you can see, the curve does resemble the original graph closely, even for this small dataset!
Limitations of tanh Activation Function
While the tanh has a lot of good properties for building classifier networks, one must always be careful when using it.
This is still a non linear activation function, which means that it can be prone to the vanishing gradient problem, when training on a large number of epochs.
The vanishing gradient problem is a situation where the derivatives become 0 (vanish) even for a large change in the input.
This becomes a problem when you’re dealing with a large number of layers on your Network, so one must always be careful about using these functions.
In this article, we learned about understanding the tanh activation function in Machine Learning.