View source: R/principal_components.R
factor_analysis | R Documentation |
The functions principal_components()
and factor_analysis()
can be used to
perform a principal component analysis (PCA) or a factor analysis (FA). They
return the loadings as a data frame, and various methods and functions are
available to access / display other information (see the 'Details' section).
factor_analysis(x, ...)
## S3 method for class 'data.frame'
factor_analysis(
x,
n = "auto",
rotation = "oblimin",
factor_method = "minres",
sort = FALSE,
threshold = NULL,
standardize = FALSE,
...
)
## S3 method for class 'matrix'
factor_analysis(
x,
n = "auto",
rotation = "oblimin",
factor_method = "minres",
n_obs = NULL,
sort = FALSE,
threshold = NULL,
standardize = FALSE,
...
)
principal_components(x, ...)
rotated_data(x, verbose = TRUE)
## S3 method for class 'data.frame'
principal_components(
x,
n = "auto",
rotation = "none",
sparse = FALSE,
sort = FALSE,
threshold = NULL,
standardize = TRUE,
...
)
## S3 method for class 'parameters_efa'
predict(
object,
newdata = NULL,
names = NULL,
keep_na = TRUE,
verbose = TRUE,
...
)
## S3 method for class 'parameters_efa'
print(x, digits = 2, sort = FALSE, threshold = NULL, labels = NULL, ...)
## S3 method for class 'parameters_efa'
sort(x, ...)
closest_component(x)
x |
A data frame or a statistical model. For |
... |
Arguments passed to or from other methods. |
n |
Number of components to extract. If |
rotation |
If not |
factor_method |
The factoring method to be used. Passed to the |
sort |
Sort the loadings. |
threshold |
A value between 0 and 1 indicates which (absolute) values
from the loadings should be removed. An integer higher than 1 indicates the
n strongest loadings to retain. Can also be |
standardize |
A logical value indicating whether the variables should be
standardized (centered and scaled) to have unit variance before the
analysis (in general, such scaling is advisable). Note: This defaults
to |
n_obs |
An integer or a matrix.
|
verbose |
Toggle warnings. |
sparse |
Whether to compute sparse PCA (SPCA, using |
object |
An object of class |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
names |
Optional character vector to name columns of the returned data frame. |
keep_na |
Logical, if |
digits |
Argument for |
labels |
Argument for |
n_components()
and n_factors()
automatically estimates the optimal
number of dimensions to retain.
performance::check_factorstructure()
checks the suitability of the
data for factor analysis using the sphericity (see
performance::check_sphericity_bartlett()
) and the KMO (see
performance::check_kmo()
) measure.
performance::check_itemscale()
computes various measures of internal
consistencies applied to the (sub)scales (i.e., components) extracted from
the PCA.
Running summary()
returns information related to each component/factor,
such as the explained variance and the Eivenvalues.
Running get_scores()
computes scores for each subscale.
factor_scores()
extracts the factor scores from objects returned by
psych::fa()
, factor_analysis()
, or psych::omega()
.
Running closest_component()
will return a numeric vector with the
assigned component index for each column from the original data frame.
Running rotated_data()
will return the rotated data, including missing
values, so it matches the original data frame.
performance::item_omega()
is a convenient wrapper around psych::omega()
,
which provides some additional methods to work seamlessly within the
easystats framework.
performance::check_normality()
checks residuals from objects returned
by psych::fa()
, factor_analysis()
, performance::item_omega()
,
or psych::omega()
for normality.
performance::model_performance()
returns fit-indices for objects returned
by psych::fa()
, factor_analysis()
, or psych::omega()
.
Running
plot()
visually displays the loadings (that requires the
see-package to work).
Complexity represents the number of latent components needed to account for the observed variables. Whereas a perfect simple structure solution has a complexity of 1 in that each item would only load on one factor, a solution with evenly distributed items has a complexity greater than 1 (Hofman, 1978; Pettersson and Turkheimer, 2010).
Uniqueness represents the variance that is 'unique' to the variable and
not shared with other variables. It is equal to 1 - communality
(variance that is shared with other variables). A uniqueness of 0.20
suggests that 20%
or that variable's variance is not shared with other
variables in the overall factor model. The greater 'uniqueness' the lower
the relevance of the variable in the factor model.
MSA represents the Kaiser-Meyer-Olkin Measure of Sampling Adequacy (Kaiser and Rice, 1974) for each item. It indicates whether there is enough data for each factor give reliable results for the PCA. The value should be > 0.6, and desirable values are > 0.8 (Tabachnick and Fidell, 2013).
There is a simplified rule of thumb that may help do decide whether to run a factor analysis or a principal component analysis:
Run factor analysis if you assume or wish to test a theoretical model of latent factors causing observed variables.
Run principal component analysis If you want to simply reduce your correlated observed variables to a smaller set of important independent composite variables.
(Source: CrossValidated)
Use get_scores()
to compute scores for the "subscales" represented by the
extracted principal components or factors. get_scores()
takes the results
from principal_components()
or factor_analysis()
and extracts the
variables for each component found by the PCA. Then, for each of these
"subscales", raw means are calculated (which equals adding up the single
items and dividing by the number of items). This results in a sum score for
each component from the PCA, which is on the same scale as the original,
single items that were used to compute the PCA. One can also use predict()
to back-predict scores for each component, to which one can provide newdata
or a vector of names
for the components.
Use summary()
to get the Eigenvalues and the explained variance for each
extracted component. The eigenvectors and eigenvalues represent the "core"
of a PCA: The eigenvectors (the principal components) determine the
directions of the new feature space, and the eigenvalues determine their
magnitude. In other words, the eigenvalues explain the variance of the
data along the new feature axes.
A data frame of loadings. For factor_analysis()
, this data frame is
also of class parameters_efa()
. Objects from principal_components()
are
of class parameters_pca()
.
Kaiser, H.F. and Rice. J. (1974). Little jiffy, mark iv. Educational and Psychological Measurement, 34(1):111–117
Hofmann, R. (1978). Complexity and simplicity as objective indices descriptive of factor solutions. Multivariate Behavioral Research, 13:2, 247-250, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1207/s15327906mbr1302_9")}
Pettersson, E., & Turkheimer, E. (2010). Item selection, evaluation, and simple structure in personality data. Journal of research in personality, 44(4), 407-420, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jrp.2010.03.002")}
Tabachnick, B. G., and Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Boston: Pearson Education.
library(parameters)
# Principal Component Analysis (PCA) -------------------
principal_components(mtcars[, 1:7], n = "all", threshold = 0.2)
# Automated number of components
principal_components(mtcars[, 1:4], n = "auto")
# labels can be useful if variable names are not self-explanatory
print(
principal_components(mtcars[, 1:4], n = "auto"),
labels = c(
"Miles/(US) gallon",
"Number of cylinders",
"Displacement (cu.in.)",
"Gross horsepower"
)
)
# Sparse PCA
principal_components(mtcars[, 1:7], n = 4, sparse = TRUE)
principal_components(mtcars[, 1:7], n = 4, sparse = "robust")
# Rotated PCA
principal_components(mtcars[, 1:7],
n = 2, rotation = "oblimin",
threshold = "max", sort = TRUE
)
principal_components(mtcars[, 1:7], n = 2, threshold = 2, sort = TRUE)
pca <- principal_components(mtcars[, 1:5], n = 2, rotation = "varimax")
pca # Print loadings
summary(pca) # Print information about the factors
predict(pca, names = c("Component1", "Component2")) # Back-predict scores
# which variables from the original data belong to which extracted component?
closest_component(pca)
# Factor Analysis (FA) ------------------------
factor_analysis(mtcars[, 1:7], n = "all", threshold = 0.2, rotation = "Promax")
factor_analysis(mtcars[, 1:7], n = 2, threshold = "max", sort = TRUE)
factor_analysis(mtcars[, 1:7], n = 2, rotation = "none", threshold = 2, sort = TRUE)
efa <- factor_analysis(mtcars[, 1:5], n = 2)
summary(efa)
predict(efa, verbose = FALSE)
# Automated number of components
factor_analysis(mtcars[, 1:4], n = "auto")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.