Hello, readers! In this article, we will be focusing on Descriptive Analysis in R programming, in detail.
So, let us begin!
What is Descriptive Analysis?
In the domain of Data Science and Machine Learning, it is very important for us to analyze the data and the variables prior to the prediction of the data through a model. This is primarily done to understand the trend of the data values and to fix any deviation in the data if necessary.
The primary step of any machine learning model is Descriptive analysis of data.
In Descriptive analysis, we tend to describe and understand the data points with various modes of representation such as statistical distribution, data visualization, representation through CSV or Excel files, etc.
Descriptive analysis can be termed as a way to gather and represent the insights about the data in a meaningful manner.
As a whole, there are two modes of Descriptive analysis:
- Measure of Variability
- Measure of Central Tendency
Let us now have a look at it one by one!
1. Measure of Variability
In Descriptive analysis, measure of variability enables us to understand the distribution of data across every single statistical measure. That is, we can understand how well the data is distributed or spread across the measure.
We mostly make use of the below points to measure the variability of the data:
- Standard Deviation
Let us now try to implement the measures one by one.
Range as a measure of variability
With range, we understand the limits of the data in terms of the distribution i.e. the extent of the data boundaries. R provides us with
range() function to calculate the boundaries of the data values.
rm(list = ls()) getwd() data <- c(1,2,3,4,5,6) r = range(data) print(r)
> print(r)  1 6
Standard deviation as a measure of variability
Standard deviation helps us understand the variation or movement of the data across the boundaries of the data i.e. the distribution of the data values against the limits.
R makes use of
sd() function to find the standard deviation as shown below:
rm(list = ls()) getwd() data <- c(1,2,3,4,5,6) var = sd(data) print(var)
2. Measure of Central Tendency
The measure of Central Tendency includes statistical measures by which we can get a value that represents the entire data set or the data variable. This helps us generalize the limits of the data values and the central measure of the data.
Usually, we make use of the following central measures to inculcate descriptive analysis:
In the below example, we have implemented mean using
mean() function for the created data structure values.
rm(list = ls()) getwd() data <- c(1,2,3,4,5,6) mn = mean(data) print(mn)
Quantile as a measure for Descriptive Analysis
Quartiles divide the entire set of data points into the group of four intervals or levels.
By this, they summarize the grouping of the data points across the dataset. R provides us with
quantile() function to get the quantile values.
rm(list = ls()) getwd() data <- c(1,2,3,4,5,6) q = quantile(data) print(q)
0% 25% 50% 75% 100% 1.00 2.25 3.50 4.75 6.00
IQR as the measure for descriptive analysis
Apart from quartile values, Inter Quartile Range can also be used as a statistical measure to understand the distribution of the data points. It represents the central 50% values i.e. the central quartile. IQR is actually the difference between the Upper quartile(75%) and the Lower quartile(25%) of the data points.
R provides us with
IQR() function to calculate the inter quartile range.
rm(list = ls()) getwd() data <- c(1,2,3,4,5,6) qr = IQR(data) print(qr)
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to R programming, Stay tuned with us!
Till then, Happy Learning!! 🙂