BoxPlots in R: Plotting 4 Types of BoxPlots in R

Filed Under: R Programming
BoxPlots In R Programming

Hello everyone! In this article, we’ll learn to plot 4 different types of boxplots in R.

So, let us begin!


The Necessity of BoxPlots in R Programming

Before diving into the concept of creating BoxPlots, let us first focus on the need for Boxplots in the analysis and processing of data.

When it comes to data analysis and predictions, we kind of expect the data from various sources such as surveys, scraping, etc. The data obtained is obviously expected to follow different data distribution.

At times when observed, we find few data values that do not follow the standard distribution of the data i.e. data values that do not fall under the normal range of the data distribution with respect to other data values. Such data points are called Outliers.

As Outliers do not follow the data distribution, if not treated, these data points may hamper the standard distribution of the data and affect the overall statistical distribution of the data in terms of mean, quartile ranges, median, etc.

Now, how do we detect these outliers in the dataset? This is when BoxPlots come into picture. We can visualize the presence of outlier data points in the data column or variable.

There are various techniques to create Boxplots in R.

Today, we will be covering the below techniques as part of the course of this topic in R–

  • boxplot() method
  • notch plots
  • Bagplots
  • Violin Boxplots

Let us have a look at them one by one!


1. Standard Boxplot to detect outliers

In this example, we have implemented the boxplot() function to create boxplots and detect the presence of outliers.

You can find the dataset here!

Initially, we have loaded the dataset into the R environment using read.csv() function. Further, we have stored the numeric column data variable names into a separate list. This is done so that the same can be fed to the boxplot() function as it works only on continuous data values.

Syntax:

boxplot(data variables)

Example:

rm(list = ls())
#Setting the working directory
setwd("D:/Edwisor_Project:Loan_Defaulter")
getwd()
#Load the dataset
dta = read.csv("bank-loan.csv",header=TRUE)
numeric_col = c("age","employ","address","income" ,"debtinc","creddebt","othdebt")
boxplot(dta[,numeric_col])

Output:

Simple BoxPLot
Simple BoxPlot

2. Notch Boxplots in R

We can customize the boxplot() method by adding few parameters to it such as–

  • main: Title of the boxplot to the displayed.
  • varwidth: Sets the width of the boxplots for every variable.
  • notch: If set TRUE, it creates notch plots such that we get to know the difference between the median of every group/variable.

Example:

Now, in this example, we have set notch to TRUE. Further we have added different colors to the boxplots for every column.

boxplot(dta[,numeric_col],notch = TRUE,    
        col = c("green", "red", "blue","yellow","pink","black","orange"))

Output:

Notch BoxPlots
Notch BoxPlots

3. Violin BoxPlots in R

R provides us with vioplot library to create violin boxplots using vioplot() function.

With vioplot() function, we can create visualization as a combination of violin plots and boxplots effectively.

Example:

library(vioplot)
vioplot(dta[,numeric_col],   
        col ="red") 

Output:

Violin Boxplot
Violin Boxplot

4. Bagplots

Apart from the standard boxplots, we can even create bagplots that focus on the relationship between the data variables as well.

Example:

library(aplpack)
bagplot(dta$age,dta$employ)

Output:

BagPlot
BagPlot

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, Stay tuned with us.

Till then, Happy learning! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content