knitr::opts_chunk$set(echo = TRUE)
Set of Assumptions for Factor and Principal Component Analysis
Description:Tests for Kaiser-Meyer-Olkin (KMO) and communalities in a dataset. It provides a final sample by removing variables in a iterable manner while keeping account of the variables that were removed in each step.
Factor Analysis and Principal Components Analysis (PCA) have some precautions and assumptions to be observed (@hair2018).
The first one is the KMO (Kaiser-Meyer-Olkin) measure, which measures the proportion of variance among the variables that can be derived from the common variance, also called systematic variance. KMO is computed between 0 and 1. Low values (close to 0) indicate that there are large partial correlations in comparison to the sum of the correlations, that is, there is a predominance of correlations of the variables that are problematic for the factorial/principal component analysis. @hair2018 suggest that individual KMOs smaller than 0.5 be removed from the factorial/principal component analysis. Consequently, this removal causes the overall KMO of the remaining variables of the factor/principal component analysis to be greater than 0.5.
The second assumption of a valid factor or PCA analysis is the communality of the rotated variables. The commonalities indicate the common variance shared by factors/components with certain variables. Greater communality indicated that a greater amount of variance in the variable was extracted by the factorial/principal component solution. For a better measurement of factorial/principal component analysis, communalities should be 0.5 or greater (@hair2018).
First we will load an example dataset bfi
from psych
and load the package FactorAssumptions
library(FactorAssumptions, quietly = T, verbose = F) bfi_data <- bfi #Remove rows with missing values and keep only complete cases bfi_data <- bfi_data[complete.cases(bfi_data),] head(bfi_data)
First we will perform the $KMO > 0.5 assumption$ for all individuals variables in the dataset with the kmo_optimal_solution
function
kmo_bfi <- kmo_optimal_solution(bfi_data, squared = FALSE)
Note that the kmo_optimal_solution
outputs a list:
df
removed
AIS
AIR
In our case none of the variables were removed due to low individual KMO values
kmo_bfi$removed
The parallel analysis of bfi
data suggests seven factors we will then perform the assumptions for all $individual communalities > 0.5$ with the argument nfactors
set to 7.
We can use either the values principal
or fa
functions from psych
package for argument type
as desired:
principal
will perform a Principal Component Analysis (PCA)fa
will perform a Factor AnalysisNote: we are using the df
generated from the kmo_optimal_solution
function
Note 2: the default of rotation employed by the communalities_optimal_solution
is varimax
. You can change if you want.
comm_bfi <- communalities_optimal_solution(kmo_bfi$df, type = "principal", nfactors = 7, squared = FALSE)
Note that the communalities_optimal_solution
outputs a list:
df
removed
loadings
principal
or fa
functions from psych
package as results
In our case 3 variables were removed in an iterable fashion due to low individual communality values. And they are listed from the lowest communality that were removed until rendered an optimal solution.
comm_bfi$removed
And finally we arrive at our final principal components analysis rotated matrix. You can export it as a CSV with write.csv
or write.csv2
comm_bfi$results
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.