In this tutorial, let’s learn how we can find the median in R. Median is defined as the measurement of central tendency in the data. In simpler terms, you may call it the ‘middle’ value.
The process includes grouping or ordering the values and then finding the middle number among them. If you encounter multiple middle values, you can take the average or ‘mean’ of those values.
Median – Merits and Demerits
- It is very easy to calculate the median. In some simple cases, you can find the median just by analyzing the values.
- Median has real use in open-ended data distributions. Because the median gives more importance to the position of the number than its value.
- One of the major advantages of the median is that it is not affected by the outliers present in the data.
Outliers: Outliers are described as the extreme values, which are different from the rest of the values in the data.
Ex: The retirement age values are – (52,53,54,54,55,56,57,58,79)
Here, 79 is an extreme value and it is different from the rest of the values or data. It will affect the mean and mode drastically. But Median will not be affected as it deals with position rather than the value.
- Median will not look for the accurate value as it will not utilize the entire data.
- Median is not capable of further statistical or mathematical operations.
Finding the median of the given values
In this section, we will create a list of values and try to find the median of those values.
#creates a list x <- c(45,76,56,87,65,45,34,56,78,98,87,65,34,48,76) #displays the values show(x) ---> 45 76 56 87 65 45 34 56 78 98 87 65 34 48 76 #calculates the median of the values in the list 'x' median(x)
You may wonder how 65 can be a middle value. Well, the median() function first groups or order the values in ascending or descending order, then it will calculate the middle or central value.
Note: If one or more values are found to be central values, then the average of them will be considered as the median.
Finding the median of the ‘Electricity consumption data of the countries’.
In this section, we import the CSV file which includes the data of ‘Electricity/energy consumption’ across the above-mentioned countries – India, Romania, USA, and Jamaica in the year 2019.
Execute the below code to find the median of the ‘Voltage’ consumed by these countries in 2019.
Note: View or Download the ‘Energy consumtion’ dataset here
#reads the value present in the file. df <- read.csv("energydata.csv") #displays the values. df #calculates the median of the 'voltage' values. median(df$Voltage)
Output: 220 Volts,
Note: In this data set, the results showed that the median is 220, i.e. the central tendency of the data is 220 volts.
Visualizing the Median of the data with the help of the box plot
In R, you can create a box plot to understand the distribution of median as shown in the below plot.
boxplot: Boxplots are used in R to understand the distribution of data. R offers the function boxplot() to create the box graph. The thick line in the plot represents the median.
Using Histogram to Understand the Median of the ‘voltage’
In this section, we are going to plot the voltage distribution with the help of a histogram in Rstudio.
Execute the below code to plot the histogram, which shows the voltage distribution and the median of the voltage.
#reads the value present in the file. df <- read.csv("energydata.csv") #displays the values. df #calculates the median of the 'voltage' values. median(df$Voltage) #plots the histogram hist(df$Voltage, col='orange', xlab='voltage', ylab='frequency', main='Voltage distribution') #adds the median line abline(v=median(df$Voltage), col='black', lwd='3') #adds the legend legend(x='topright', c('median'),col = 'black', lwd = '3')
In the above plot, you can see the ‘black’ line, which is actually showing the median. Through the histograms we can easily demonstrate the mean, median, and density curves as well.
With the help of the Median() function, we can understand the central tendency of the data. Median is very easy to find in some cases, where you are able to tell the median value by just inspecting it.
R offers great visualizing functions to understand the hidden data patterns. As shown above, you can easily analyze the median using the histogram and box plots.
That’s all for now. Connect with us for more R tutorials. Don’t hesitate to comment below if you have any queries. Happy learning!!!.