Hello, readers! In this article, we will be focusing on Factor Analysis in R programming, in detail.
So, let us begin!! 🙂
Table of Contents
What is Factor Analysis in R?
Before diving deep into the concept of Factor Analysis in R, let us first understand the role of Factors.
Factors are the data variables on the basis of which we perform the predictions and analysis on the data values. With Factors in place, we perform various analysis techniques to understand the raw data and then decide the credibility of the present variables/factors in the dataset.
Definitely, we cannot take the raw factors into consideration as it is without any understanding of its need in the current prediction problem. For the same, we need to have certain techniques/algorithms in place that will help us analyze the factors in place.
And, then estimate the required parameters needed to go ahead with the predictions. In the context of this course, we would be focusing on — Factor Analysis in R programming.
Factor Analysis in R – Overview!
Factor Analysis helps us analyze the important factors that are needed for implementation and use. With this, we can decide whether the factor or the feature is important to our analysis and implementation.
The first objective of Factor Analysis is the verification of the data with the already known facts and knowledge. Thus, with factor analysis, we can cut short and have an estimation of the number of important factors for prediction.
In this technique, we try to generate a latent relationship amongst the variables of the dataset. Further to which, we cut down those variables i.e. narrow it down to a small number of factors that represents the summary and features of all the other variables.
Let us now have a look at the implementation of the Factor Analysis in R in the upcoming section.
Implementation of Factor Analysis on Loan Defaulter Dataset
In this example, we will be using Loan Defaulter Dataset for analysis the Factor Analysis process. You can find the dataset here!
- At first, we load the dataset into the R environment using read.csv() function.
- Further, to apply Factor Analysis, we need to install and import ‘psych‘ and ‘GPArotation‘ libraries.
- After which, we define the relationship between the data variable using the Correlation Regression Analysis technique using corrgram() function.
- This result of the correlation analysis is stored into a data variable.
- Further, we apply fa() method and provide the number of factors we want, it returns the estimated number of factors along with the loadings with it.
- At last, we use fa.parallel() method to apply Factor Analysis on the above data (product of correlation regression analysis) and get a graph of the estimation.
rm(list = ls()) #Setting the working directory setwd("D:/Edwisor_Project:Loan_Defaulter") getwd() #Load the dataset dta = read.csv("bank-loan.csv",header=TRUE) install.packages("psych") library(psych) ##Correlation Regression analysis library(corrgram) cor = corrgram(dta[,numeric_col],order=FALSE,upper.panel = panel.pie, text.panel = panel.txt, main= "Correlation Analysis Plot of the Numeric Variables") install.packages('GPArotation') library('GPArotation') fa(r = cor, nfactors = 6) f_data <- fa.parallel(cor, fa='fa', fm='miners') #Getting the factor loadings and model analysis f_data
- We get a correlation matrix between the variables and the estimated factors describing the relation between them.
- Further, it suggests that 3 factors are enough to represent the variables.
Factor Analysis using method = miners Call: fa(r = cor, nfactors = 6) Standardized loadings (pattern matrix) based upon correlation matrix MR3 MR2 MR1 MR6 MR5 MR4 h2 u2 com age 0.76 0.09 0.11 -0.06 0.18 -0.11 0.80 0.199 1.2 employ 0.20 -0.22 0.05 0.23 0.47 0.08 0.57 0.426 2.5 address 0.77 -0.07 -0.05 0.06 -0.12 0.10 0.53 0.472 1.1 income 0.02 -0.06 0.92 0.05 -0.01 0.01 0.92 0.081 1.0 debtinc 0.01 0.89 -0.08 0.11 -0.04 0.01 0.91 0.086 1.0 creddebt 0.00 0.43 0.30 0.01 0.17 0.34 0.74 0.262 3.2 othdebt 0.01 0.16 0.13 0.71 0.07 0.01 0.87 0.133 1.2 MR3 MR2 MR1 MR6 MR5 MR4 SS loadings 1.34 1.19 1.22 0.83 0.51 0.25 Proportion Var 0.19 0.17 0.17 0.12 0.07 0.04 Cumulative Var 0.19 0.36 0.54 0.65 0.73 0.76 Proportion Explained 0.25 0.22 0.23 0.16 0.10 0.05 Cumulative Proportion 0.25 0.47 0.70 0.86 0.95 1.00 With factor correlations of MR3 MR2 MR1 MR6 MR5 MR4 MR3 1.00 -0.02 0.45 0.26 0.50 0.05 MR2 -0.02 1.00 0.02 0.60 0.11 0.34 MR1 0.45 0.02 1.00 0.63 0.68 0.47 MR6 0.26 0.60 0.63 1.00 0.39 0.26 MR5 0.50 0.11 0.68 0.39 1.00 0.18 MR4 0.05 0.34 0.47 0.26 0.18 1.00 Mean item complexity = 1.6 Test of the hypothesis that 6 factors are sufficient. The degrees of freedom for the null model are 21 and the objective function was 3.53 The degrees of freedom for the model are -6 and the objective function was 0 The root mean square of the residuals (RMSR) is 0 The df corrected root mean square of the residuals is NA Fit based upon off diagonal values = 1 Measures of factor score adequacy MR3 MR2 MR1 MR6 MR5 MR4 Correlation of (regression) scores with factors 0.91 0.96 0.97 0.93 0.79 0.74 Multiple R square of scores with factors 0.82 0.93 0.94 0.87 0.63 0.54 Minimum correlation of possible factor scores 0.65 0.85 0.87 0.74 0.26 0.09 Parallel analysis suggests that the number of factors = 3 and the number of components = NA
The above graph represents the eigen values of the actual and stimulated data. As seen above, we seen maximum difference in the actual and stimulated data towards the left of the graph. Thus, we can say that any number of factors between 3-5 is acceptable and desirable for the dataset.
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to R programming, Stay tuned with us!
Till then, Happy Learning!! 🙂