Hello everyone! In this article, we’ll learn to plot 4 different types of boxplots in R.
So, let us begin!
The Necessity of BoxPlots in R Programming
Before diving into the concept of creating BoxPlots, let us first focus on the need for Boxplots in the analysis and processing of data.
When it comes to data analysis and predictions, we kind of expect the data from various sources such as surveys, scraping, etc. The data obtained is obviously expected to follow different data distribution.
At times when observed, we find few data values that do not follow the standard distribution of the data i.e. data values that do not fall under the normal range of the data distribution with respect to other data values. Such data points are called Outliers.
As Outliers do not follow the data distribution, if not treated, these data points may hamper the standard distribution of the data and affect the overall statistical distribution of the data in terms of mean, quartile ranges, median, etc.
Now, how do we detect these outliers in the dataset? This is when BoxPlots come into picture. We can visualize the presence of outlier data points in the data column or variable.
There are various techniques to create Boxplots in R.
Today, we will be covering the below techniques as part of the course of this topic in R–
- boxplot() method
- notch plots
- Violin Boxplots
Let us have a look at them one by one!
1. Standard Boxplot to detect outliers
In this example, we have implemented the boxplot() function to create boxplots and detect the presence of outliers.
You can find the dataset here!
Initially, we have loaded the dataset into the R environment using read.csv() function. Further, we have stored the numeric column data variable names into a separate list. This is done so that the same can be fed to the
boxplot() function as it works only on continuous data values.
rm(list = ls()) #Setting the working directory setwd("D:/Edwisor_Project:Loan_Defaulter") getwd() #Load the dataset dta = read.csv("bank-loan.csv",header=TRUE) numeric_col = c("age","employ","address","income" ,"debtinc","creddebt","othdebt") boxplot(dta[,numeric_col])
2. Notch Boxplots in R
We can customize the boxplot() method by adding few parameters to it such as–
- main: Title of the boxplot to the displayed.
- varwidth: Sets the width of the boxplots for every variable.
- notch: If set TRUE, it creates notch plots such that we get to know the difference between the median of every group/variable.
Now, in this example, we have set notch to TRUE. Further we have added different colors to the boxplots for every column.
boxplot(dta[,numeric_col],notch = TRUE, col = c("green", "red", "blue","yellow","pink","black","orange"))
3. Violin BoxPlots in R
R provides us with vioplot library to create violin boxplots using
With vioplot() function, we can create visualization as a combination of violin plots and boxplots effectively.
library(vioplot) vioplot(dta[,numeric_col], col ="red")
Apart from the standard boxplots, we can even create bagplots that focus on the relationship between the data variables as well.
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to R programming, Stay tuned with us.
Till then, Happy learning! 🙂