Kurtosis in R [A practical Guide]

Filed Under: R Programming
Kurtosis In R

Hello, readers! In this article, we will be focusing on Kurtosis in R programming, in detail.

So, let us begin!! 馃檪


What is Kurtosis?

Before diving deep into the concept of kurtosis testing, let us first understand the emergence of the same.

In the domain of data science and machine learning, data preprocessing plays a very crucial role in defining the accuracy of the end product.

Feature Scaling, as we all know, is necessary to scale the variables of the dataset prior to modeling. By scaling the features, we intend to make the data scale-free and this makes it easier for modeling to happen.

Kurtosis is one such statistical test that enables us to identify the data that requires feature scaling.

In Kurtosis testing, we statistically test the data against the distribution of elements. It detects and estimates the distribution of the data from the distribution graph.

In kurtosis testing, we come across three observations–

  1. Heavily-tailed data: In this type of distribution, the data consists of a heavy amount of outliers.
  2. Light-tailed data: Here, the data lacks the outliers in it.
  3. Uniform distribution: It is said to have zero kurtoses, as the data is bell-shaped and follows a normal distribution.

Usually, uniformly distributed data is said to have a kurtosis value equal to 3.

Thus, in a nutshell, Kurtosis determines the sharpness of the peak in the distribution of data.


Variants of Kurtosis

Based on the standard theory, there are three variants of Kurtosis into existance:

  1. Platykurtic: If the distribution of data has the coefficient of kurtosis less than 3, it is said to be platykurtic in nature.
  2. Leptokurtic: It happens to have the value of the coefficient of kurtosis to be greater than 3. In such cases, the distribution of the data follows a sharp peak on the graph.
  3. Mesorkurtic: It has the value of the coefficient of kurtosis to be equal to 3. And, follows a bell-shaped distribution of data as well. It follows a normal distribution of data.

Implementing Kurtosis in R with the kurtosis() function

In order to apply kurtosis in R, we need to load the moments library into the environment. Further, the kurtosis() function enables us to calculate the coefficient of kurtosis in R.

Example:

#Removed all the existing objects
rm(list = ls())
# Required for skewness() function
install.packages('moments')
library(moments) 

# Defining data vector 
data <- c(10,12,13,30,40,50) 

print(kurtosis(data)) 

Output:

As the coefficient of kurtosis is less than 3, it is said to be Platykurtic in nature and thus requires feature scaling.

-1.933024

Now, let us try implementing the concept of kurtosis testing on a dataset. Here, we have made use of the Bike Rental Count Prediction problem. You can find the dataset here!

Example:

Initially, we load the dataset into the R environment using the read.csv() function. Further, we use the kurtosis function on every numeric variable to test it against feature scaling.

#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)

library(e1071)    
library(moments)    
numeric_col_updated = c('temp','hum','windspeed')
for(x in numeric_col_updated)
{
  print(x)
  kurtosis_test = kurtosis(bike_data[,x])
  print(kurtosis_test)
}

Output:

From the below output, it is clear that the numeric variables of the dataset are Platykurtic in nature.

[1] "temp"
[1] -1.124564
[1] "hum"
[1] -0.080291
[1] "windspeed"
[1] 0.3906245

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R, stay tuned with us.

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages