CD: Comparison Data

View source: R/CD.R

CDR Documentation

Comparison Data

Description

Factor retention method introduced by Ruscio and Roche (2012). The code was adapted from the CD code by Auerswald and Moshagen (2017) available at https://osf.io/x5cz2/?view_only=d03efba1fd0f4c849a87db82e6705668

Usage

CD(
  x,
  n_factors_max = NA,
  N_pop = 10000,
  N_samples = 500,
  alpha = 0.3,
  use = c("pairwise.complete.obs", "all.obs", "complete.obs", "everything",
    "na.or.complete"),
  cor_method = c("pearson", "spearman", "kendall"),
  max_iter = 50
)

Arguments

x

data.frame or matrix. Dataframe or matrix of raw data.

n_factors_max

numeric. The maximum number of factors to test against. Larger numbers will increase the duration the procedure takes, but test more possible solutions. If left NA (default) the maximum number of factors for which the model is still over-identified (df > 0) is used.

N_pop

numeric. Size of finite populations of comparison data. Default is 10000.

N_samples

numeric. Number of samples drawn from each population. Default is 500.

alpha

numeric. The alpha level used to test the significance of the improvement added by an additional factor. Default is .30.

use

character. Passed to stats::cor. Default is "pairwise.complete.obs". However, for the comparison data procedure, NA values will be excluded using na.omit(). If missing data should be handled differently (e.g., imputation), do this before passing the data to CD().

cor_method

character. Passed to stats::cor. Default is "pearson".

max_iter

numeric. The maximum number of iterations to perform after which the iterative PAF procedure is halted. Default is 50.

Details

"Parallel analysis (PA) is an effective stopping rule that compares the eigenvalues of randomly generated data with those for the actual data. PA takes into account sampling error, and at present it is widely considered the best available method. We introduce a variant of PA that goes even further by reproducing the observed correlation matrix rather than generating random data. Comparison data (CD) with known factorial structure are first generated using 1 factor, and then the number of factors is increased until the reproduction of the observed eigenvalues fails to improve significantly" (Ruscio & Roche, 2012, p. 282).

The CD implementation here is based on the code by Ruscio and Roche (2012), but is slightly adapted to increase speed by performing the principal axis factoring using a C++ based function.

Note that if the data contains missing values, these will be removed for the comparison data procedure using stats::na.omit. If missing data should be treated differently, e.g., by imputation, do this outside CD and then pass the complete data.

The CD function can also be called together with other factor retention criteria in the N_FACTORS function.

Value

A list of class CD containing

n_factors

The number of factors to retain according to comparison data results.

eigenvalues

A vector containing the eigenvalues of the entered data.

RMSE_eigenvalues

A matrix containing the RMSEs between the eigenvalues of the generated data and those of the entered data.

settings

A list of the settings used.

Source

Auerswald, M., & Moshagen, M. (2019). How to determine the number of factors to retain in exploratory factor analysis: A comparison of extraction methods under realistic conditions. Psychological Methods, 24(4), 468–491. https://doi.org/10.1037/met0000200

Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment, 24, 282–292. doi: 10.1037/a0025697

See Also

Other factor retention criteria: EKC, HULL, KGC, PARALLEL, SMT

N_FACTORS as a wrapper function for this and all the above-mentioned factor retention criteria.

Examples


# determine n factors of the GRiPS
CD(GRiPS_raw)

# determine n factors of the DOSPERT risk subscale
CD(DOSPERT_raw)


mdsteiner/EFAdiff documentation built on Jan. 10, 2023, 8:54 a.m.