Naive Bayes Algorithm in Python – A Brief Introduction

Filed Under: Machine Learning
NAIVE BAYES

Hey, folks! In our series of Machine Learning algorithms, today we will be focusing on Naive Bayes Algorithm in Python in detail.

So, let us begin!


What is Naive Bayes Algorithm?

Naive Bayes is a Supervised Classification Machine Learning algorithm. It is a classification algorithm that is based on the below theorem–

  • Bayes Theorem
  • Maximum A Posteriori Hypothesis

Let us have a look at the below formula-

Naive Bayes
Naive Bayes – Bayes Theorem

The above formula represents the Bayes theorem which determines the probability of A given the evidence B (observed data sample B).

Thus, in Naive Bayes, we determine the probability that a particular hypothesis holds true for a particular evidence of the dataset.

Let us now understand the assumptions in the upcoming section.


Assumptions of Naive Bayes

Naive Bayes theorem assumes that the effect of a data feature/attribute on a given class or set is independent of the values of the other data variables/attributes of the dataset.

That is, the data variables are independent with regards to the effect of them over the probability class. This concept is termed as Class Conditional Independence.


Implementing Naive Bayes in Python

Initially we used pandas.read_csv() function to load the dataset into the environment.

You can find the dataset used in the examples, here.

Further, we have split the dataset into training and testing dataset using train_test_split() function.

Example:

import pandas as pd
import numpy as np
data = pd.read_csv("bank-loan.csv") # dataset
loan = data.copy()

from sklearn.model_selection import train_test_split 
X = loan.drop(['default'],axis=1) 
Y = loan['default'].astype(str)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)

# Naive Bayes Algorithm
from sklearn.naive_bayes import GaussianNB
Naive = GaussianNB().fit(X_train,Y_train)
target = Naive.predict(X_test)
print(target)

Here, we have applied Gaussian Naive Bayes theorem using GaussianNB() to predict whether the customer is a loan defaulter(0) or not(1).

Output:

array(['0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '1',
       '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0',
       '0', '0', '0', '0', '1', '0', '0', '0', '1', '1', '0', '0', '1',
       '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
       '0', '1', '0', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0',
       '1', '0', '1', '0', '0', '1', '0', '0', '1', '0', '0', '0', '1',
       '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0',
       '1', '0', '1', '0', '0', '0', '0', '1', '0', '0', '0', '1', '0',
       '1', '0', '1', '0', '0', '0', '0', '0', '0', '0', '1', '0', '1',
       '0', '0', '1', '1', '0', '0', '0', '0', '1', '0', '1', '0', '0',
       '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0'], dtype='<U1')

Types of Naive Bayes Algorithms

Naive Bayes can be further classified into the following types–

  • Bernoulli Naive Bayes
  • Multinomial Naive Bayes
  • Gaussian Naive Bayes

Let us have a look at each one of them in detail in the below section.


1. Bernoulli Naive Bayes

It is based on Bernoulli distribution of data. It is useful for binary classification i.e. when the outcome depends on only two responses.


2. Multinomial Naive Bayes

It is a discrete classification algorithm and used when the output represents the frequency of occurrences of a term.


3. Gaussian Naive Bayes

In Gaussian Naive Bayes, we assume that the continuous variables follow the Normal distribution of data. Here, the mean and variance are calculated using the maximum likelihood approach.


Advantages of Naive Bayes

  • Robust to Missing or NULL values.
  • As this algorithm uses simple probability approach, it is less prone to Overfitting.
  • Performs well for multiclass classification.
  • Faster results and easy to apply.

Limitations of Naive Bayes

  • Zero Frequency problem–It arises when the algorithm assigns zero probability to a dataset. It can be overcome using smoothing techniques such as Laplace smoothing technique.
  • Assumption of independent predictor variables is hazardous in real time datasets.

Application of Naive Bayes

  • Multi-class Prediction of data groups
  • Recommended Systems
  • Text Classification
  • Sentiment Analysis
  • Spam Filtering

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python, Stay tuned @ Python with JournalDev and till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages