# Factor Analysis in R programming

Filed Under: R Programming Hello, readers! In this article, we will be focusing on Factor Analysis in R programming, in detail.

So, let us begin!! 🙂

## What is Factor Analysis in R?

Before diving deep into the concept of Factor Analysis in R, let us first understand the role of Factors.

Factors are the data variables on the basis of which we perform the predictions and analysis on the data values. With Factors in place, we perform various analysis techniques to understand the raw data and then decide the credibility of the present variables/factors in the dataset.

Definitely, we cannot take the raw factors into consideration as it is without any understanding of its need in the current prediction problem. For the same, we need to have certain techniques/algorithms in place that will help us analyze the factors in place.

And, then estimate the required parameters needed to go ahead with the predictions. In the context of this course, we would be focusing on — Factor Analysis in R programming.

## Factor Analysis in R – Overview!

Factor Analysis helps us analyze the important factors that are needed for implementation and use. With this, we can decide whether the factor or the feature is important to our analysis and implementation.

The first objective of Factor Analysis is the verification of the data with the already known facts and knowledge. Thus, with factor analysis, we can cut short and have an estimation of the number of important factors for prediction.

In this technique, we try to generate a latent relationship amongst the variables of the dataset. Further to which, we cut down those variables i.e. narrow it down to a small number of factors that represents the summary and features of all the other variables.

Let us now have a look at the implementation of the Factor Analysis in R in the upcoming section.

## Implementation of Factor Analysis on Loan Defaulter Dataset

In this example, we will be using Loan Defaulter Dataset for analysis the Factor Analysis process. You can find the dataset here!

1. At first, we load the dataset into the R environment using read.csv() function.
2. Further, to apply Factor Analysis, we need to install and import ‘psych‘ and ‘GPArotation‘ libraries.
3. After which, we define the relationship between the data variable using the Correlation Regression Analysis technique using corrgram() function.
4. This result of the correlation analysis is stored into a data variable.
5. Further, we apply fa() method and provide the number of factors we want, it returns the estimated number of factors along with the loadings with it.
6. At last, we use fa.parallel() method to apply Factor Analysis on the above data (product of correlation regression analysis) and get a graph of the estimation.

Example:

```rm(list = ls())
#Setting the working directory
setwd("D:/Edwisor_Project:Loan_Defaulter")
getwd()

install.packages("psych")
library(psych)
##Correlation Regression analysis
library(corrgram)
cor = corrgram(dta[,numeric_col],order=FALSE,upper.panel = panel.pie,
text.panel = panel.txt,
main= "Correlation Analysis Plot of the Numeric Variables")
install.packages('GPArotation')
library('GPArotation')
fa(r = cor, nfactors = 6)
f_data <- fa.parallel(cor, fa='fa', fm='miners')
f_data
```

Output:

• We get a correlation matrix between the variables and the estimated factors describing the relation between them.
• Further, it suggests that 3 factors are enough to represent the variables.
```Factor Analysis using method =  miners
Call: fa(r = cor, nfactors = 6)
MR3   MR2   MR1   MR6   MR5   MR4   h2    u2 com
age      0.76  0.09  0.11 -0.06  0.18 -0.11 0.80 0.199 1.2
employ   0.20 -0.22  0.05  0.23  0.47  0.08 0.57 0.426 2.5
address  0.77 -0.07 -0.05  0.06 -0.12  0.10 0.53 0.472 1.1
income   0.02 -0.06  0.92  0.05 -0.01  0.01 0.92 0.081 1.0
debtinc  0.01  0.89 -0.08  0.11 -0.04  0.01 0.91 0.086 1.0
creddebt 0.00  0.43  0.30  0.01  0.17  0.34 0.74 0.262 3.2
othdebt  0.01  0.16  0.13  0.71  0.07  0.01 0.87 0.133 1.2

MR3  MR2  MR1  MR6  MR5  MR4
Proportion Var        0.19 0.17 0.17 0.12 0.07 0.04
Cumulative Var        0.19 0.36 0.54 0.65 0.73 0.76
Proportion Explained  0.25 0.22 0.23 0.16 0.10 0.05
Cumulative Proportion 0.25 0.47 0.70 0.86 0.95 1.00

With factor correlations of
MR3   MR2  MR1  MR6  MR5  MR4
MR3  1.00 -0.02 0.45 0.26 0.50 0.05
MR2 -0.02  1.00 0.02 0.60 0.11 0.34
MR1  0.45  0.02 1.00 0.63 0.68 0.47
MR6  0.26  0.60 0.63 1.00 0.39 0.26
MR5  0.50  0.11 0.68 0.39 1.00 0.18
MR4  0.05  0.34 0.47 0.26 0.18 1.00

Mean item complexity =  1.6
Test of the hypothesis that 6 factors are sufficient.

The degrees of freedom for the null model are  21  and the objective function was  3.53
The degrees of freedom for the model are -6  and the objective function was  0

The root mean square of the residuals (RMSR) is  0
The df corrected root mean square of the residuals is  NA

Fit based upon off diagonal values = 1
MR3  MR2  MR1  MR6  MR5  MR4
Correlation of (regression) scores with factors   0.91 0.96 0.97 0.93 0.79 0.74
Multiple R square of scores with factors          0.82 0.93 0.94 0.87 0.63 0.54
Minimum correlation of possible factor scores     0.65 0.85 0.87 0.74 0.26 0.09

Parallel analysis suggests that the number of factors =  3  and the number of components =  NA
```

The above graph represents the eigen values of the actual and stimulated data. As seen above, we seen maximum difference in the actual and stimulated data towards the left of the graph. Thus, we can say that any number of factors between 3-5 is acceptable and desirable for the dataset.

## Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, Stay tuned with us!

Till then, Happy Learning!! 🙂

close
Generic selectors
Exact matches only
Search in title
Search in content