Skewness Test in R – All you need to know!

Filed Under: R Programming
SKEWNESS IN R

Hello, readers! In this article, we will be focusing on an important test with regards to Feature Scaling — Skewness test in R, in detail.

So, let us begin!


What is Skewness in R?

Feature Scaling is an important aspect of data pre-processing. Prior to applying the model on the dataset, it is necessary to make the entire data completely scale free i.e. independent of each other’s scales(units). This process is known as Feature Scaling that are performed in the below manners:

  • Normalization
  • Standardization

This is when Skewness Test comes into picture! Before applying scaling to the data, it is necessary to check and detect the variables that actually need scaling and the variables that are good to be processed as they are.

Skewness test is a statistical measure that represents the distribution of the data. It describes the position of the majority of the elements in terms of the distribution against the mean value of the particular data variable.

Usually, a skewness value which is beyond the below range is considered as skew i.e. needs scaling–

-0.5<data_column<0.5

Types of skewness:

  1. Positive skewness
  2. Negative skewness
  3. Symmetric/zero-skewness

Let us now focus on these variants of skewness in the upcoming section!


1. Positive skewness in R

The data variable is said to be Positively skewed when the coefficient of skewness is greater than 0. At this point, majority of the data elements are clustered on the left side of the graph.

Moreover, in a positively skewed variable, most of the data elements happen to be less than the mean value.

In order to make use of skewness test in R, we need to load the ‘moments’ package into the R environment. The skewness() function enables us to calculate the coefficient of skewness for every variable. We’ll start by creating a data vector in R.

Example:

#Removed all the existing objects
rm(list = ls())
# Required for skewness() function
install.packages('moments')
library(moments) 

# Defining data vector 
data <- c(10,11,12,13,20) 

print(skewness(data)) 

hist(data)

As the coefficient of skewness is greater than 0, the variable ‘data’ is said to be positively skewed.

Output:

1.2099
Positive Skewness
Positive Skewness

2. Negative skewness in R

In Negative skewness, the value of the coefficient of skewness is less than 0 i.e. a negative value. Hence, majority of the data elements are clustered in the right side of the graph.

To add, the majority of the elements have values greater than the mean of the data variable.

Example:

#Removed all the existing objects
rm(list = ls())
# Required for skewness() function
install.packages('moments')
library(moments) 

# Defining data vector 
data <- c(1,2,30,31,32,33) 

print(skewness(data)) 

hist(data) 


Output:

As seen below, the coefficient of skewness is less than 0, thus it is said to have a negative skewness.

-0.6952504
Negative Skewness
Negative Skewness

3. Symmetric data in R

If the coefficient of skewness is close to 0 i.e. between -0.5 to +0.5, then the data is said to be symmetric. That is, the data follows a Normal Distribution and thus, doesn’t require scaling.

Example:

You can find the dataset here!

#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)

library(e1071)    
library(moments)    
numeric_col_updated = c('temp','hum','windspeed')
for(x in numeric_col_updated)
{
  print(x)
  skew_test = skewness(bike_data[,x])
  print(skew_test)
}

Output:

[1] "temp"
[1] -0.05429742
[1] "hum"
[1] -0.05949731
[1] "windspeed"
[1] 0.6745682

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, stay tuned!!

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content