Factor in R: A Practical Reference

Filed Under: R Programming
Factors In R Programmingalberts

The R language offers a data structure specifically to store the nominal data. Yes, I am talking about Factor in the R language. We can call the factor, a special vector types which are merely used to allocate categorical (ordinal) values.

You may ask, why can’t we use the character vectors?

On any day, you can use that with no worries.

But the privilege of having a factor in R is that you no need to store the categorical variables multiple times.

For example, if your data has multiple categories such as MALE and FEMALE, you will no longer need to store it multiple times i.e. MALE, FEMALE, FEMALE, MALE…

Instead of this, R can store those categories as 1,2,2,1, which will eventually reduce the memory size to store that information. And more importantly, many machine learning algorithms such as regression will not deal with nominal data. 

Note:

You cannot use a factor for a character vector that is not truly categorical. If a character vector stores unique values such as names, it’s better to keep it as a character vector


Simple Implementation of Factor in R

Now that you know what a factor in R is, and why it’s superior to the character vector, let’s move on to the implementation. Let’s create a simple factor.

#creates a character vector
class_gender <- c('MALE','FEMALE','MALE','FEMALE','FEMALE','MALE')

#returns the factors 
factor(class_gender)
MALE FEMALE MALE FEMALE FEMALE MALE  

Levels:  FEMALE   MALE

Well, as you can here, the factor() function returned the levels in the vector i.e. MALE and FEMALE. Now the system can store these values as 1 and 2 respectively.

Create Additional Levels Using a Factor

Can you get any meaning out of this heading? If not, don’t worry! As the headline says, you can create an additional level that is not present in the data.

For example, in our previous case, we have 2 levels MALE and FEMALE which are present in the data.

But, you can also add levels such as OTHERS which is not present in the data.

Now, to illustrate this I will be taking the blood groups as my factors and let’s see how it works.

#creates a character vector
blood_groups <- c('O','AB','A','A','AB','O')

#Returns the levels
factor(blood_groups)
O  AB A  A  AB O 

Levels: A AB O

You can observe that the factor function has returned the levels in the data.

Now, we are going to add a level that is not present in the input data or character vector.

#returns additional level which is not in data
factor(blood_groups, levels = c('A','AB','O','B'))
O  AB A  A  AB O 
Levels: A AB O B

Fantastic!!! We have successfully added an extra level which is a newbie.

Wrapping Up

Factor in R is the most useful function which assists you in the regression analysis to convert the nominal data to numeric data with ease.

Factors in R will give an extra space where you can easily deal with nominal data and it is helpful in encodings as well.

I hope with this you got better of factor() function and it’s use in R language. That’s all for now!!! Happy R!!!

More read: R documentation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages