Data Visualization with Python Seaborn and Pandas

Filed Under: Python Modules
Python Seaborn And Pandas

Hey, folks! Today we will be unveiling a very interesting module of Python — Seaborn Module and will be understanding its contribution to Data Visualizations.


Need of Seaborn module

Data visualization is the representation of the data values in a pictorial format. Visualization of data helps in attaining a better understanding and helps draw out perfect conclusions from the data.

Python Matplotlib library provides a base for all the data visualization modules present in Python. Python Seaborn module is built over the Matplotlib module and provides functions with better efficiency and plot features inculcated in it.

With Seaborn, data can be presented with different visualizations and different features can be added to it to enhance the pictorial representation.


Visualizing Data with Python Seaborn

In order to get started with data visualization with Seaborn, the following modules need to be installed and imported in the Python environment.

Note: I have linked the above modules(in the bullets) with the article links for reference.

Further, we need to install and load the Python Seaborn module into the environment.

pip install seaborn
import seaborn

Now that we have installed and imported the Seaborn module in our working environment, Let us get started with Data visualizations in Seaborn.


Statistical Data Visualization with Seaborn

Python Seaborn module helps us visualize and depict the data in statistical terms i.e. understanding of the relationship between data values with the help of the following plots:

  1. Line Plot
  2. Scatter Plot

Let us understand each of them in detail in the upcoming sections.


Seaborn Line Plot

Seaborn Line Plot depicts the relationship between the data values amongst a set of data points. Line Plot helps in depicting the dependence of a data variable/value over the other data value.

The seaborn.lineplot() function plots a line out of the data points to visualize the dependence of a data variable over the other parametric data variable.

Syntax:

seaborn.lineplot(x,y)

Example 1:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.lineplot(data['hp'],data['cyl'])
plt.show()

Output:

Data Visualization With Seaborn Line Plot
Data Visualization With Seaborn Line Plot

Example 2:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.lineplot(data['hp'],data['cyl'],hue=data['am'],style=data['am'])
plt.show()

In the above example, we have depicted the relationship between various data values using the parameter hue and style to depict the relationship between them using different plotting styles.

Output:

Data Visualization With Seaborn Multi Line Plot
Data Visualization With Seaborn Multi Line Plot

Seaborn Scatter Plot

Seaborn Scatter plot too helps depicts the relationship between various data values against a continuous/categorical data value(parameter).

Scatter plot is extensively used to detect outliers in the field of data visualization and data cleansing. The outliers is the data values that lie away from the normal range of all the data values. Scatter plot helps in visualizing the data points and highlight the outliers out of it.

Syntax:

seaborn.scatterplot()

The seaborn.scatterplot()function plots the data points in the clusters of data points to depict and visualize the relationship between the data variables. While visualizing the data model, we need to place the dependent or the response variable values against the y-axis and independent variable values against the x-axis.

Example 1:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.scatterplot(data['hp'],data['cyl'])
plt.show()

Output:

Data Visualization With Seaborn Scatter Plot
Data Visualization With Seaborn Scatter Plot

Example 2:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.scatterplot(data['hp'],data['cyl'],hue=data['am'],style=data['am'])
plt.show()

With the parameters ‘hue‘ and ‘style‘, we can visualize multiple data variables with different plotting styles.

Output:

Data Visualization With Seaborn Multi Scatter Plot
Data Visualization With Seaborn Multi Scatter Plot

Categorical Data visualization with Seaborn and Pandas

Before getting started with the categorical data distribution, it is necessary for us to understand certain terms related to data analysis and visualization.

  • Continuous variable: It is a data variable that contains continuous and numeric values. For example: Age is a continuous variable whose value can lie between 1 – 100
  • Categorical variable: It is a data variable containing discrete values i.e. in the form of groups or categories. For example: Gender can be categorized into two groups– ‘Male’, ‘Female’ and ‘Others’.

Having understood the basic terminologies, let us dive into the visualization of categorical data variables.


Box Plot

Seaborn Boxplot is used to visualize the categorical/numeric data variable and is extensively used to detect outliers in the data cleansing process.

The seaborn.boxplot() method is used create a boxplot for a particular data variable. The box structure represents the main quartile of the plot.

Syntax:

seaborn.boxplot()

The two lines represent the lower and the upper range. Any data point that lies below the lower range or above the upper range is considered as an outlier.

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.boxplot(data['mpg'])
plt.show()

Output:

Data Visualization With Seaborn BoxPlot
Data Visualization With Seaborn BoxPlot

In the above boxplot, the data point lying above the upper range is marked as a data point and considered as an outlier to the dataset.


Boxen Plot

Seaborn Boxenplot resembles the boxplot but has a slight difference in the presentation of the plot.

The seaborn.boxenplot() function plots the data variable with enlarged inter quartile blocks depicting a detailed representation of the data values.

Syntax:

seaborn.boxenplot()

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.boxenplot(data['hp'])
plt.show()

Output:

Data Visualization With Seaborn BoxenPlot
Data Visualization With Seaborn BoxenPlot

Violin Plot

Seaborn Violin Plot is used to represent the underlying data distribution of a data variable across its data values.

Syntax:

seaborn.violinplot()

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.violinplot(data['hp'])
plt.show()

Output:

Data Visualization With Seaborn ViolinPlot
Data Visualization With Seaborn ViolinPlot

SwarmPlot

Seaborn Swarmplot gives a better picture in terms of the description of the relationship amongst categorical data variables.

The seaborn.swarmplot() function creates a swarm of data points around the data values that happen to represent a relationship between the two categorical data variables/columns.

Syntax:

seaborn.swarmplot()

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.swarmplot(data['am'],data['cyl'])
plt.show()

Output:

Data Visualization With Seaborn SwarmPlot
Data Visualization With Seaborn SwarmPlot

Estimation of categorical data using Seaborn

In the field of data analysis and visualization, we often require data plots that help us estimate the frequency or count of certain survey/re-searches, etc. The following plots are useful to serve the same purpose:

  1. Barplot
  2. Pointplot
  3. Countplot

1. Barplot

Seaborn Barplot represents the data distribution among the data variables as a frequency distribution of the central tendency values.

Syntax:

seaborn.barplot()

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.barplot(data['cyl'],data['carb'])
plt.show()

Output:

Data Visualization With Seaborn Barplot
Data Visualization With Seaborn Barplot

2. Pointplot

Seaborn Pointplot is a combination of Statistical Seaborn Line and Scatter Plots. The seaborn.pointplot() function represents the relationship between the data variables in the form of scatter points and lines joining them.

Syntax:

seaborn.pointplot()

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.pointplot(data['carb'],data['cyl'])
plt.show()

Output:

Data Visualization With Seaborn Pointplot
Data Visualization With Seaborn Pointplot

3. Countplot

Seaborn Countplot represents the count or the frequency of the data variable passed to it. Thus it can be considered as a Univariate Data distribution plot.

Syntax:

seaborn.countplot()

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.countplot(data['carb'])
plt.show()

Output:

Data Visualization With Seaborn CountPlot
Data Visualization With Seaborn CountPlot

Univariate distribution using Seaborn Distplot

The Seaborn Distplot is extensively used for univariate data distribution and visualization i.e. visualizing the data values of a single data variable.

The seaborn.distplot() function depicts the data distribution of a continuous variable. It is represented as histogram along with a line.

Syntax:

seaborn.distplot()

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.distplot(data['mpg'])
plt.show()

Output:

Data Visualization With Seaborn Distplot
Data Visualization With Seaborn Distplot

Bivariate distribution using Seaborn Kdeplot

Seaborn Kdeplot depicts the statistical probability distribution representation of multiple continuous variables altogether.

Syntax:

seaborn.kdeplot()

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
res = sn.kdeplot(data['mpg'],data['qsec'])
plt.show()

Output:

Data Visualization With Seaborn Kdeplot
Data Visualization With Seaborn Kdeplot

Setting different backgrounds using Seaborn

The seaborn.set() function can be used to set different background to the plots such as ‘dark‘, ‘whitegrid‘, ‘darkgrid‘, etc.

Syntax:

seaborn.set(style)

Example:

import seaborn as sn
import matplotlib.pyplot as plt
import numpy as np
import pandas
data = pandas.read_csv("C:/mtcars.csv")
sn.set(style='darkgrid',)
res = sn.lineplot(data['mpg'],data['qsec'])
plt.show()

Output:

Data Visualization With Different Seaborn Themes
Data Visualization With Different Seaborn Themes

Conclusion

Thus, Seaborn module helps in visualizing the data using different plots according to the purpose of visualization.


References

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages