generate_data: generate_data

View source: R/generate_data.R

generate_dataR Documentation

generate_data

Description

This function generates data according to the multiple membership random effects model with correlated clusters given by the variance-covariance matrix specified in gen_z_varcov.

Usage

generate_data(
  .n_sch = 5,
  .n_stu = 5,
  .u_resid_var = 0.2,
  .clust_cov = 0.8,
  .wt_vec = c(0.5, 0.5),
  .pct_mobile = 0,
  .mean_x = 5,
  .var_x = 4,
  .mean_r = 0,
  .var_r = 2,
  .gamma_z = 0,
  .gamma_x = c(10, sqrt(17)/2),
  .mm_format = c("compact", "wide"),
  .progress_bar = NULL,
  ...
)

Arguments

.n_sch

Numeric scalar. Gives the total number of schools in the dataset. The variance-covariance matrix for predictor z will have dimensions .n_sch x .n_sch.

.n_stu

A numeric scalar. The number of students attending each school. Note: this is not the total number of students in the dataset, merely the number of students per school.

.u_resid_var

Numeric scalar. Gives the residual variance of u0j (i.e., the variance unexplained after controlling for the school-level predictor, z).

.clust_cov

Numeric vector. The first element of the vector gives the variance of all schools' predictors, z. If present, the second element gives the covariance of z between schools k and k + 1. The values given in .clust_cov apply to all schools (that is, similar to a Toeplitz pattern). Any off-diagonal values (i.e., covariances) not specified will default to 0. The main diagonal is the variance explained by the predictor.

.wt_vec

A numeric vector with length equal to the maximum number of schools attended by students in the data (in this simulation, the maximum number is 2). The values in .wt_vec are used to weight the effects of different schools attended on students. For this study, all mobile students must have the same weights. If different weighting patterns are desired, the code will need to be updated.

.mean_x

Numeric scalar. The mean of the predictor, x.

.var_x

Numeric scalar. The variance of the predictor, x.

.mean_r

Numeric scalar. The mean of the person-level residual, r.

.var_r

Numeric scalar. The variance of the person-level residual, r.

.gamma_z

Numeric scalar. The school-level effect of the z_predictors on the random intercept.

.gamma_x

Numeric vector with length p (where p is the number of model coefficients, including the intercept).

.mm_format

String. Options are "compact" or "wide". Compact formats require two variables, one for the organization ID and one for the weights; the number of variables in each type is equal to the maximum number of organizational memberships for any individual in the dataset. The wide format requires only one variable type, which provides the weight for each person for each organization. MLwiN documentation provides conflicting information about which type is required by runMLwiN. In the Stata documentation. it is said that both R and Stata require wide format. In the documentation for the R2MLwiN package multiple membership analysis is demonstrated with compact form. Here, we use compact form as provided in the R2MLwiN package publication in the Journal of Statistical Software.

.progress_bar

Internal argument passed from run_sim. If .progress_bar = TRUE in the external run_sim function, then a progress bar will be displayed while executing this code. The arguments in this internal function defaults to NULL.

...

Other parameters passed to assign_mobility.

Value

This function returns data generated under the correlated cluster multiple membership model, with correlated clusters defined by the correlation between the school-level predictor z. The result is a tibble with number of rows equal to .n_sch * .n_stu and a number of columns equal to 18 + .n_sch.

Examples

## Not run: 

# with the following defaults, we adjust x1beta to get the
# desired icc:

## icc = 0.05
# x1beta <- sqrt(17)/2

## icc = 0.15
# x1beta <- sqrt(11/3)/2

## icc = 0.30
x1beta <- 1/(2*sqrt(3))

generate_data(
  .n_sch = 5,
  .n_stu = 5,
  .u_resid_var = 0.2,
  .clust_cov = c(.8, 1),
  .wt_vec = c(0.5, 0.5),
  .pct_mobile = 0,
  .mean_x = 5,
  .var_x = 4,
  .mean_r = 0,
  .var_r = 2,
  .gamma_z = 0,
  .gamma_x = c(10, x1beta)
)


## End(Not run)

tessaleejohnson/corclus documentation built on Oct. 11, 2022, 3:46 a.m.