knitr::opts_chunk$set( collapse = TRUE, comment = "#>" # TODO: Add Citations and References. Elaborate or refer to choice between fa and PCA )
Conducting a factor analysis (fa) or principale component analysis (pca) can be
challenging. It starts with the choice between either fa
or pca
. And continues
with the question on how many factors or components to retain.
In this vignette we will look especially into the second question. How to determine the
number of factors/components to retain with parallel analysis (HORN citation),
and what this package provides to help with the process.Especially we will look into the possibility to add bootstrapped confidence intervals around the observed Eigenvalues.
The following are helpful sources (e.g., on the question on how to determine the number of Factors / Components) to be advanced, article citations
to be inserted picture of the result
Below you find a step-by-step tutorial in just 4 easy steps on how to create bootstrapped confidence intervals (CIs) around the observed Eigenvalues in parallel analysis.
This first section will setup the necessary packages and provide a first glance at the example survey data (on personality) we will need for the tutorial.
We start by loading all required packages. For this vignette we will use the following packages:
I personally like pacman
package manager for loading libraries in R. But that's up to your preference, you might as well just you library()
instead
pacman::p_load(datscience, psych, dplyr, flextable)
We will use the bfi
data, that comes with the psych
package as an example here.
The data set contains information on $k = 25$ items from a Big Five personality questionnaire and
additional $k = 3$ items on sociodemograhpic data from $n = 2800$ participants.
More details can be obtained with ?bfi
.
For this vignette we will just use the 10 items on agreeableness and extraversion,
and sample randomly 300 participants in order to improve the run time.
Below you can get first insights into the data.frame. For the overview we used the datscience::format_flextable
function for the apperance of the table.
set.seed(42) # necessary for reproducible results data(bfi) # Select only agreeableness and extraversion and sample participants df <- bfi |> select(matches("^(E\\d|A\\d)")) |> # Regex run down # ^ : starts with # (|) : Logical or # E\\d: E followed by any digit, e.g. E3 matches E\\d # A\\d: A followed by any digit, e.g. A5 matches E\\d sample_n(300) dim(df) str(df) class(df) df |> head(5) |> flextable() |> datscience::format_flextable(table_caption = c("Table 1", "Glimpse at the First 5 Rows of Shortened bfi Data"))
The psych package makes it pretty convenient to conduct a parallel analysis.
The fa
parameter allows you to specify if you want to run a principal component
analysis (fa = "pca"
), or a factor analysis (fa = "fa"
) or both (fa = "both"
).
As we want to showcase CI for the observed Eigenvalues for both method we chose
"both" here.
# Run a Parallel Analysis (Horn, 1965) with psych pacakge and save the parallel object parallel_analysis <- psych::fa.parallel(df, fa="both")
To get bootstrapped Eigenvalues we just use two functions from the datscience
package. First we need to create a boot
object with booted_eigenvalues()
,
additionally supplying the method (either pc
or fa
) and the data.frame.
Afterwards we will extract the confidence interval (CI) with the function
getCIs
by passing the boot object.
PCA_bootObj <- booted_eigenvalues(df, iterations = 500, fa = "pc") PCA_bootObj PCA_CIs <- getCIs(PCA_bootObj) PCA_CIs
EFA_bootObj <- booted_eigenvalues(df, iterations = 500, fa = "fa") EFA_bootObj EFA_CIs <- getCIs(EFA_bootObj) EFA_bootObj
The output of the psych::fa.parallel
can be easily beautified by the
pretty_scree
function (below we exemplarily showcase the pretty scree for pca).
PCA_Plot <- pretty_scree(parallel_analysis,fa="pc") EFA_Plot <- pretty_scree(parallel_analysis,fa="fa") PCA_Plot
Adding the obtained CIs to the plot is also pretty easy with the function
add_ci_2plot()
. You can either chose to display the CIs as bands (with
type = "band"
or as error bars with type = "errorbars"
)
EFA_Plot <- add_ci_2plot(EFA_Plot,EFA_CIs,type="band") PCA_Plot <- add_ci_2plot(PCA_Plot,PCA_CIs,type="errorbars") EFA_Plot PCA_Plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.