Hello readers! In the last article, we looked briefly at the sigmoid activation function. In this article, we’ll be looking at the **Tanh Activation Function** in Python, in regards to Neural Networks.

Let’s get started!

Table of Contents

## The Tanh Activation Function

We often use activation functions when we want to “turn on” specific layers depending on the input, in terms of a mathematical function.

Tanh is one such function, which is very popular in Machine Learning literature, since it is a continuous and differential function.

The tanh function is of the below form, across the Real Number space:

`f(x) = tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)`

This function can have values ranging from (-1, 1), making the output *normalized* with respect to the input. Due to the above properties, tanh is a very good choice for backpropagation.

To get a visual understanding, here is the graph of Tanh(x):

The graph is very similar to the sigmoid activation function (S-shaped), which is another popular choice.

Here, if you can observe from the graph, tanh can correlate inputs → outputs very well. Strongly positive inputs are normalized and mapped closer to 1, while strongly negative inputs are mapped close to -1.

This makes it a very suitable choice for performing **binary classification**.

## A simple implementation of the Tanh Activation Function in Python

Let’s quickly go through a sample `tanh`

function in Python, using numpy and matplotlib.

```
import numpy as np
import matplotlib.pyplot as plt
def tanh(x):
return np.tanh(x) # We can use numpy's builtin tanh
def generate_sample_data(start, end, step):
# Generates sample data using np.linspace
return np.linspace(start, end, step)
x = generate_sample_data(-5, 5, 10)
y = tanh(x)
# Now plot
plt.xlabel("x")
plt.ylabel("tanh(x)")
plt.plot(x, y)
plt.show()
```

*Output*

As you can see, the curve does resemble the original graph closely, even for this small dataset!

## Limitations of tanh Activation Function

While the tanh has a lot of good properties for building classifier networks, one must always be careful when using it.

This is still a non linear activation function, which means that it can be prone to the *vanishing gradient problem,* when training on a large number of epochs.

The vanishing gradient problem is a situation where the derivatives become 0 (vanish) even for a large change in the input.

This becomes a problem when you’re dealing with a large number of layers on your Network, so one must always be careful about using these functions.

## Conclusion

In this article, we learned about understanding the tanh activation function in Machine Learning.

## References

- Wolfram Alpha Page on Tanh function
- JournalDev article on Sigmoid Activation Function