The R language offers a data structure specifically to store the nominal data. Yes, I am talking about Factor in the R language. We can call the factor, a special vector types which are merely used to allocate categorical (ordinal) values.
You may ask, why can’t we use the character vectors?
On any day, you can use that with no worries.
But the privilege of having a factor in R is that you no need to store the categorical variables multiple times.
For example, if your data has multiple categories such as MALE and FEMALE, you will no longer need to store it multiple times i.e. MALE, FEMALE, FEMALE, MALE…
Instead of this, R can store those categories as 1,2,2,1, which will eventually reduce the memory size to store that information. And more importantly, many machine learning algorithms such as regression will not deal with nominal data.
Note:
You cannot use a factor for a character vector that is not truly categorical. If a character vector stores unique values such as names, it’s better to keep it as a character vector
Table of Contents
Simple Implementation of Factor in R
Now that you know what a factor in R is, and why it’s superior to the character vector, let’s move on to the implementation. Let’s create a simple factor.
#creates a character vector class_gender <- c('MALE','FEMALE','MALE','FEMALE','FEMALE','MALE') #returns the factors factor(class_gender)
MALE FEMALE MALE FEMALE FEMALE MALE Levels: FEMALE MALE
Well, as you can here, the factor() function returned the levels in the vector i.e. MALE and FEMALE. Now the system can store these values as 1 and 2 respectively.
Create Additional Levels Using a Factor
Can you get any meaning out of this heading? If not, don’t worry! As the headline says, you can create an additional level that is not present in the data.
For example, in our previous case, we have 2 levels MALE and FEMALE which are present in the data.
But, you can also add levels such as OTHERS which is not present in the data.
Now, to illustrate this I will be taking the blood groups as my factors and let’s see how it works.
#creates a character vector blood_groups <- c('O','AB','A','A','AB','O') #Returns the levels factor(blood_groups)
O AB A A AB O Levels: A AB O
You can observe that the factor function has returned the levels in the data.
Now, we are going to add a level that is not present in the input data or character vector.
#returns additional level which is not in data factor(blood_groups, levels = c('A','AB','O','B'))
O AB A A AB O Levels: A AB O B
Fantastic!!! We have successfully added an extra level which is a newbie.
Wrapping Up
Factor in R is the most useful function which assists you in the regression analysis to convert the nominal data to numeric data with ease.
Factors in R will give an extra space where you can easily deal with nominal data and it is helpful in encodings as well.
I hope with this you got better of factor() function and it’s use in R language. That’s all for now!!! Happy R!!!
More read: R documentation