In this tutorial, we will learn about the sigmoid activation function. The sigmoid function always returns an output between** 0 and 1. **

**After this tutorial you will know:**

- What is an activation function?
- How to implement the sigmoid function in python?
- How to plot the sigmoid function in python?
- Where do we use the sigmoid function?
- What are the problems caused by the sigmoid activation function?
- Better alternatives to the sigmoid activation.

Table of Contents

## What is an activation function?

An activation function is a mathematical function that controls the output of a neural network. Activation functions help in determining whether a neuron is to be fired or not.

**Some of the popular activation functions are :**

- Binary Step
- Linear
- Sigmoid
- Tanh
- ReLU
- Leaky ReLU
- Softmax

Activation is responsible for adding **non-linearity** to the output of a neural network model. Without an activation function, a neural network is simply a linear regression.

The mathematical equation for calculating the output of a neural network is:

In this tutorial, we will focus on the** sigmoid activation function.** This function comes from the sigmoid function in maths.

Let’s start by discussing the formula for the function.

## The formula for the sigmoid activation function

Mathematically you can represent the sigmoid activation function as:

You can see that the denominator will always be greater than 1, therefore the output will always be between 0 and 1.

## Implementing the Sigmoid Activation Function in Python

In this section, we will learn how to implement the sigmoid activation function in Python.

**We can define the function in python as:**

```
import numpy as np
def sig(x):
return 1/(1 + np.exp(-x))
```

Let’s try running the function on some inputs.

```
import numpy as np
def sig(x):
return 1/(1 + np.exp(-x))
x = 1.0
print('Applying Sigmoid Activation on (%.1f) gives %.1f' % (x, sig(x)))
x = -10.0
print('Applying Sigmoid Activation on (%.1f) gives %.1f' % (x, sig(x)))
x = 0.0
print('Applying Sigmoid Activation on (%.1f) gives %.1f' % (x, sig(x)))
x = 15.0
print('Applying Sigmoid Activation on (%.1f) gives %.1f' % (x, sig(x)))
x = -2.0
print('Applying Sigmoid Activation on (%.1f) gives %.1f' % (x, sig(x)))
```

Output :

```
Applying Sigmoid Activation on (1.0) gives 0.7
Applying Sigmoid Activation on (-10.0) gives 0.0
Applying Sigmoid Activation on (0.0) gives 0.5
Applying Sigmoid Activation on (15.0) gives 1.0
Applying Sigmoid Activation on (-2.0) gives 0.1
```

## Plotting Sigmoid Activation using Python

To plot sigmoid activation we’ll use the Numpy library:

```
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-10, 10, 50)
p = sig(x)
plt.xlabel("x")
plt.ylabel("Sigmoid(x)")
plt.plot(x, p)
plt.show()
```

Output :

We can see that the output is between 0 and 1.

The sigmoid function is commonly used for predicting probabilities since the probability is always between 0 and 1.

One of the disadvantages of the sigmoid function is that towards the end regions **the Y values respond very less to the change in X values. **

This results in a problem known as the **vanishing gradient problem.**

Vanishing gradient slows down the learning process and hence is undesirable.

Let’s discuss some alternatives that overcome this problem.

## ReLu activation function

A better alternative that solves this problem of vanishing gradient is the ReLu activation function.

The ReLu activation function returns 0 if the input is negative otherwise return the input as it is.

Mathematically it is represented as:

You can implement it in Python as follows:

```
def relu(x):
return max(0.0, x)
```

Let’s see how it works on some inputs.

```
def relu(x):
return max(0.0, x)
x = 1.0
print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x)))
x = -10.0
print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x)))
x = 0.0
print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x)))
x = 15.0
print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x)))
x = -20.0
print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x)))
```

Output:

```
Applying Relu on (1.0) gives 1.0
Applying Relu on (-10.0) gives 0.0
Applying Relu on (0.0) gives 0.0
Applying Relu on (15.0) gives 15.0
Applying Relu on (-20.0) gives 0.0
```

The problem with ReLu is that the gradient for negative inputs comes out to be zero.

This again leads to the problem of vanishing gradient (zero-gradient) for negative inputs.

To solve this problem we have another alternative known as the **Leaky ReLu activation function.**

## Leaky ReLu activation function

The leaky ReLu addresses the problem of zero gradients for negative value, by giving an extremely small linear component of x to negative inputs.

**Mathematically we can define it as:**

```
f(x)= 0.01x, x<0
= x, x>=0
```

**You can implement it in Python using:**

```
def leaky_relu(x):
if x>0 :
return x
else :
return 0.01*x
x = 1.0
print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x)))
x = -10.0
print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x)))
x = 0.0
print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x)))
x = 15.0
print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x)))
x = -20.0
print('Applying Leaky Relu on (%.1f) gives %.1f' % (x, leaky_relu(x)))
```

Output :

```
Applying Leaky Relu on (1.0) gives 1.0
Applying Leaky Relu on (-10.0) gives -0.1
Applying Leaky Relu on (0.0) gives 0.0
Applying Leaky Relu on (15.0) gives 15.0
Applying Leaky Relu on (-20.0) gives -0.2
```

## Conclusion

This tutorial was about the Sigmoid activation function. We learned how to implement and plot the function in python.