knitr::opts_chunk$set(echo = TRUE)
Set of Assumptions for Factor and Principal Component Analysis
Description:Tests for Kaiser-Meyer-Olkin (KMO) and communalities in a dataset. It provides a final sample by removing variables in a iterable manner while keeping account of the variables that were removed in each step.
Factor Analysis and Principal Components Analysis (PCA) have some precautions and assumptions to be observed (@hair2018).
The first one is the KMO (Kaiser-Meyer-Olkin) measure, which measures the proportion of variance among the variables that can be derived from the common variance, also called systematic variance. KMO is computed between 0 and 1. Low values (close to 0) indicate that there are large partial correlations in comparison to the sum of the correlations, that is, there is a predominance of correlations of the variables that are problematic for the factorial/principal component analysis. @hair2018 suggest that individual KMOs smaller than 0.5 be removed from the factorial/principal component analysis. Consequently, this removal causes the overall KMO of the remaining variables of the factor/principal component analysis to be greater than 0.5.
The second assumption of a valid factor or PCA analysis is the communality of the rotated variables. The commonalities indicate the common variance shared by factors/components with certain variables. Greater communality indicated that a greater amount of variance in the variable was extracted by the factorial/principal component solution. For a better measurement of factorial/principal component analysis, communalities should be 0.5 or greater (@hair2018).
First we will load an example dataset bfi
from psych
and load the package FactorAssumptions
library(FactorAssumptions, quietly = T, verbose = F) bfi_data <- bfi #Remove rows with missing values and keep only complete cases bfi_data <- bfi_data[complete.cases(bfi_data),] head(bfi_data)
First we will perform the $KMO > 0.5 assumption$ for all individuals variables in the dataset with the kmo_optimal_solution
function
kmo_bfi <- kmo_optimal_solution(bfi_data, squared = FALSE)
Note that the kmo_optimal_solution
outputs a list:
df
removed
AIS
AIR
In our case none of the variables were removed due to low individual KMO values
kmo_bfi$removed
The parallel analysis of bfi
data suggests seven factors we will then perform the assumptions for all $individual communalities > 0.5$ with the argument nfactors
set to 7.
We can use either the values principal
or fa
functions from psych
package for argument type
as desired:
principal
will perform a Principal Component Analysis (PCA)fa
will perform a Factor AnalysisNote: we are using the df
generated from the kmo_optimal_solution
function
Note 2: the default of rotation employed by the communalities_optimal_solution
is varimax
. You can change if you want.
comm_bfi <- communalities_optimal_solution(kmo_bfi$df, type = "principal", nfactors = 7, squared = FALSE)
Note that the communalities_optimal_solution
outputs a list:
df
removed
loadings
principal
or fa
functions from psych
package as results
In our case 3 variables were removed in an iterable fashion due to low individual communality values. And they are listed from the lowest communality that were removed until rendered an optimal solution.
comm_bfi$removed
And finally we arrive at our final principal components analysis rotated matrix. You can export it as a CSV with write.csv
or write.csv2
comm_bfi$results
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.