Descriptive Analysis in R programming

Filed Under: R Programming
Descriptive Analysis In R

Hello, readers! In this article, we will be focusing on Descriptive Analysis in R programming, in detail.

So, let us begin!


What is Descriptive Analysis?

In the domain of Data Science and Machine Learning, it is very important for us to analyze the data and the variables prior to the prediction of the data through a model. This is primarily done to understand the trend of the data values and to fix any deviation in the data if necessary.

The primary step of any machine learning model is Descriptive analysis of data.

In Descriptive analysis, we tend to describe and understand the data points with various modes of representation such as statistical distribution, data visualization, representation through CSV or Excel files, etc.

Descriptive analysis can be termed as a way to gather and represent the insights about the data in a meaningful manner.

As a whole, there are two modes of Descriptive analysis:

  1. Measure of Variability
  2. Measure of Central Tendency

Let us now have a look at it one by one!


1. Measure of Variability

In Descriptive analysis, measure of variability enables us to understand the distribution of data across every single statistical measure. That is, we can understand how well the data is distributed or spread across the measure.

We mostly make use of the below points to measure the variability of the data:

Let us now try to implement the measures one by one.


Range as a measure of variability

With range, we understand the limits of the data in terms of the distribution i.e. the extent of the data boundaries. R provides us with range() function to calculate the boundaries of the data values.

Example:

rm(list = ls())
getwd()

data <- c(1,2,3,4,5,6)

r = range(data) 
print(r) 

Output:

> print(r) 
[1] 1 6

Standard deviation as a measure of variability

Standard deviation helps us understand the variation or movement of the data across the boundaries of the data i.e. the distribution of the data values against the limits.

R makes use of sd() function to find the standard deviation as shown below:

Example:

rm(list = ls())
getwd()

data <- c(1,2,3,4,5,6)

var = sd(data) 
print(var) 

Output:

1.87

2. Measure of Central Tendency

The measure of Central Tendency includes statistical measures by which we can get a value that represents the entire data set or the data variable. This helps us generalize the limits of the data values and the central measure of the data.

Usually, we make use of the following central measures to inculcate descriptive analysis:

  1. mean
  2. mode
  3. median

In the below example, we have implemented mean using mean() function for the created data structure values.

rm(list = ls())
getwd()

data <- c(1,2,3,4,5,6)

mn = mean(data) 
print(mn) 

Output:

3.5

Quantile as a measure for Descriptive Analysis

Quartiles divide the entire set of data points into the group of four intervals or levels.

By this, they summarize the grouping of the data points across the dataset. R provides us with quantile() function to get the quantile values.

Example:

rm(list = ls())
getwd()

data <- c(1,2,3,4,5,6)

q = quantile(data) 
print(q) 

Output:

  0%  25%  50%  75% 100% 
1.00 2.25 3.50 4.75 6.00 

IQR as the measure for descriptive analysis

Apart from quartile values, Inter Quartile Range can also be used as a statistical measure to understand the distribution of the data points. It represents the central 50% values i.e. the central quartile. IQR is actually the difference between the Upper quartile(75%) and the Lower quartile(25%) of the data points.

R provides us with IQR() function to calculate the inter quartile range.

rm(list = ls())
getwd()

data <- c(1,2,3,4,5,6)

qr = IQR(data) 
print(qr) 

Output:

2.5

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, Stay tuned with us!

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content