Hello, readers! In this article, we will be focusing on an important test with regards to Feature Scaling — **Skewness test in R**, in detail.

So, let us begin!

Table of Contents

## What is Skewness in R?

Feature Scaling is an important aspect of data pre-processing. Prior to applying the model on the dataset, it is necessary to make the entire data completely scale free i.e. independent of each other’s scales(units). This process is known as Feature Scaling that are performed in the below manners:

**Normalization****Standardization**

This is when `Skewness Test`

comes into picture! Before applying scaling to the data, it is necessary to check and detect the variables that actually need scaling and the variables that are good to be processed as they are.

Skewness test is a statistical measure that represents the distribution of the data. It describes the position of the majority of the elements in terms of the distribution against the mean value of the particular data variable.

Usually, a skewness value which is beyond the below range is considered as skew i.e. needs scaling–

**-0.5<data_column<0.5**

Types of skewness:

**Positive skewness****Negative skewness****Symmetric/zero-skewness**

Let us now focus on these variants of skewness in the upcoming section!

### 1. Positive skewness in R

The data variable is said to be Positively skewed when the coefficient of skewness is greater than 0. At this point, majority of the data elements are clustered on the left side of the graph.

Moreover, in a positively skewed variable, most of the data elements happen to be less than the mean value.

In order to make use of skewness test in R, we need to load the ‘moments’ package into the R environment. The skewness() function enables us to calculate the coefficient of skewness for every variable. We’ll start by creating a data vector in R.

**Example:**

#Removed all the existing objects rm(list = ls()) # Required for skewness() function install.packages('moments') library(moments) # Defining data vector data <- c(10,11,12,13,20) print(skewness(data)) hist(data)

As the coefficient of skewness is greater than 0, the variable ‘data’ is said to be positively skewed.

**Output:**

1.2099

### 2. Negative skewness in R

In Negative skewness, the value of the coefficient of skewness is less than 0 i.e. a negative value. Hence, majority of the data elements are clustered in the right side of the graph.

To add, the majority of the elements have values greater than the mean of the data variable.

**Example:**

#Removed all the existing objects rm(list = ls()) # Required for skewness() function install.packages('moments') library(moments) # Defining data vector data <- c(1,2,30,31,32,33) print(skewness(data)) hist(data)

**Output:**

As seen below, the coefficient of skewness is less than 0, thus it is said to have a negative skewness.

-0.6952504

### 3. Symmetric data in R

If the coefficient of skewness is close to 0 i.e. between -0.5 to +0.5, then the data is said to be symmetric. That is, the data follows a Normal Distribution and thus, doesn’t require scaling.

**Example:**

You can find the dataset here!

#Removed all the existing objects rm(list = ls()) #Setting the working directory setwd("D:/Ediwsor_Project - Bike_Rental_Count/") getwd() #Load the dataset bike_data = read.csv("day.csv",header=TRUE) library(e1071) library(moments) numeric_col_updated = c('temp','hum','windspeed') for(x in numeric_col_updated) { print(x) skew_test = skewness(bike_data[,x]) print(skew_test) }

**Output:**

[1] "temp" [1] -0.05429742 [1] "hum" [1] -0.05949731 [1] "windspeed" [1] 0.6745682

## Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, stay tuned!!

Till then, Happy Learning!! 🙂