GenData | R Documentation |
This function simulates data with nfact
factors based on empirical data.
It represents the simulation data part of the CD function
and the CDF function. This function improves upon
GenDataPopulation by utilizing C++ code to achieve faster data simulation.
GenData(
response,
nfact = 1,
N.pop = 10000,
Max.Trials = 5,
lr = 1,
cor.type = "pearson",
use = "pairwise.complete.obs",
isSort = FALSE
)
response |
A required |
nfact |
The number of factors to extract in factor analysis. (default = 1) |
N.pop |
Size of finite populations for simulating. (default = 10,000) |
Max.Trials |
The maximum number of consecutive trials without obtaining a lower RMSR. (default = 5) |
lr |
The learning rate for updating the correlation matrix during iteration. (default = 1) |
cor.type |
A character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman". @seealso cor. |
use |
An optional character string specifying a method for computing covariances in the presence of missing values. This must be one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs" (default). @seealso cor. |
isSort |
Logical, determines whether the simulated data needs to be sorted in descending order. (default = FALSE) |
The core idea of GenData
is to start with the empirical data's correlation matrix
and iteratively approach data with nfact
factors. Any value in the simulated data must come
from the empirical data. The specific steps of GenData
are as follows:
Use the empirical data (\mathbf{Y}_{emp}
) correlation matrix as the target, \mathbf{R}_{targ}
.
Simulate scores for N.pop
examinees on nfact
factors using a multivariate standard normal distribution:
\mathbf{S}_{(N.pop \times nfact)} \sim \mathcal{N}(0, 1)
Simulate noise for N.pop
examinees on I
items:
\mathbf{U}_{(N.pop \times I)} \sim \mathcal{N}(0, 1)
Initialize \mathbf{R}_{temp} = \mathbf{R}_{targ}
, and set the minimum Root
Mean Square Residual RMSR_{min} = \text{Inf}
. Start the iteration process.
Extract nfact
factors from \mathbf{R}_{temp}
, and obtain the factor
loadings matrix \mathbf{L}_{shar}
. Ensure that the first element of
\mathbf{L}_{share}
is positive to standardize the direction.
Calculate the unique factor matrix \mathbf{L}_{uniq, (I \times 1)}
:
L_{uniq,i} = \sqrt{1 - \sum_{j=1}^{nfact} L_{share, i, j}^2}
Calculate the simulated data \mathbf{Y}_{sim}
:
Y_{sim, i, j} = \mathbf{S}_{i} \mathbf{L}_{shar, j}^T + U_{i, j} L_{uniq,i}
Compute the correlation matrix of the simulated data, \mathbf{R}_{simu}
.
Calculate the residual correlation matrix \mathbf{R}_{resi}
between the
target matrix \mathbf{R}_{targ}
and the simulated data's correlation matrix \mathbf{R}_{simu}
:
\mathbf{R}_{resi} = \mathbf{R}_{targ} - \mathbf{R}_{simu}
Calculate the current RMSR:
RMSR_{cur} = \sqrt{\frac{\sum_{i < j} \mathbf{R}_{resi, i, j}^2}{0.5 \times (I^2 - I)}}
If RMSR_{cur} < RMSR_{min}
, update \mathbf{R}_{temp} = \mathbf{R}_{temp} +
lr \times \mathbf{R}_{resi}
, RMSR_{min} = RMSR_{cur}
, set \mathbf{R}_{min, resi} = \mathbf{R}_{resi}
,
and reset the count of consecutive trials without improvement cou = 0
.
If RMSR_{cur} \geq RMSR_{min}
, update \mathbf{R}_{temp} = \mathbf{R}_{temp} +
0.5 \times cou \times lr \times \mathbf{R}_{min, resi}
and increment cou = cou + 1
.
Repeat steps (4) through (10) until cou \geq Max.Trials
.
Of course C++ code is used to speed up.
A N.pop
* I
matrix containing the simulated data.
Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment, 24, 282–292. http://dx.doi.org/10.1037/a0025697.
library(EFAfactors)
set.seed(123)
##Take the data.bfi dataset as an example.
data(data.bfi)
response <- as.matrix(data.bfi[, 1:25]) ## loading data
response <- na.omit(response) ## Remove samples with NA/missing values
## Transform the scores of reverse-scored items to normal scoring
response[, c(1, 9, 10, 11, 12, 22, 25)] <- 6 - response[, c(1, 9, 10, 11, 12, 22, 25)] + 1
data.simulated <- GenData(response, nfact = 1, N.pop = 10000)
head(data.simulated)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.