Create dummy variables in R

Filed Under: R Programming
Create Dummy Variables In R

Hello, readers! In this article, we will be focusing on How to Create dummy variables in R programming, in detail.

So, let us begin!


Why do we need dummy variables in R?

Let us first understand the concept of dummy variables. Consider a dataset that represents some categorical data values.

Handling such a huge number of categories and groups is a cumbersome task for the machine learning model. Thus arises the need to treat categorical or level entries.

This is when the concept of dummy entries comes into picture.

A dummy variable is a numeric interpretation of the category or level of the factor variable. That is, it represents every group or level of the categorical variable as a single numeric entity.

For example, consider a data set that contains a variable ‘Poll’ with values ‘Yes’ and ‘No’. Now, in order to represent the two groups as numeric entries, we can create dummies of the same.

So, the transformed dataset would now have two more additional columns as ‘Poll.1’ which would represent ‘yes’ type values (would assign 1 to all the data rows that are associated with level yes) and ‘Poll.2’ for ‘No’ type values.


1. R fast.dummies library to create dummy variables

R provides us with fast.dummies library that contains of dummy_cols() function for the creation of dummy variables at ease.

With dummy_cols() function, one can select the variables for whom the dummies need to be created.

Syntax:

dummy_cols(data, select_columns = 'columns'

Example:

In this example, we have made use of the Bank Load Defaulter dataset. You can find the dataset here.

Further, we have made use of dummy_cols() function to create dummy variables for the column ‘ed’.

rm(list = ls())
 
#install.packages('fastDummies')
library('fastDummies')
dta = read.csv("bank-loan.csv",header=TRUE)
dim(dta)
dum <- dummy_cols(dta, select_columns = 'ed')
dim(dum)

Output:

As witnessed below, the initial number of columns of the data set equals to 9. Post creation of dummy variables, the number of columns equals to 14.

All the 5 levels of the ed variable has been segregated as a separate column. Only those rows which belongs to a certain category are set as 1, rest all values are set to zero(0).

> dim(dta)
[1] 850   9

> dim(dum)
[1] 850  14
Creation Of Dummies Using fastDummies Library
Creation Of Dummies Using fastDummies Library

What if we need to create dummies for multiple variables in a single shot or at once?

Well, we can then create a list of all the variables for which we need dummies using c() function and pass them as arguments through select_columns.

Example:

rm(list = ls())
 
#install.packages('fastDummies')
library('fastDummies')
dta = read.csv("bank-loan.csv",header=TRUE)
dim(dta)
dum <- dummy_cols(dta, select_columns = c('ed','default'))
dim(dum)

Output:

Here, we have created dummies for both ‘ed’ and ‘default’ data columns.

> dim(dta)
[1] 850   9
> dum <- dummy_cols(dta, select_columns = c('ed','default'))
> dim(dum)
[1] 850  17
Creation Of Dummies For Multiple Columns
Creation Of Dummies For Multiple Columns

2. R dummies library to create dummy variables

R dummies library can also be used to create dummy data variables for the categorical data columns at ease.

For the same, we can make use of dummy() function that enables us to create dummy entries for selected columns.

Example:

In the below example, we have created dummy variables of the column ‘ed’ using dummy() function.

rm(list = ls())

library('dummies')
dta = read.csv("bank-loan.csv",header=TRUE)
dim(dta)
dum <- dummy(dta$ed)
dim(dum)

Output:

As seen below, all the levels have been segregated as a different column.

Also, only those data rows that match to the particular level is set to 1 in the column else it is represented as zero.

For example, if the data represents the level ‘ed1’, then it is set to 1 else it is set to 0.

Creation Of Dummies Using dummies Library
Creation Of Dummies Using dummies Library

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

Do let us know about your experience with dummy variables in the comment box!

For more such posts related to R programming, Stay tuned with us.

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content