generate_multilevel_data_model: Generate individual data given a dataframe of site level...

View source: R/gen_multilevel_data.R

generate_individual_dataR Documentation

Generate individual data given a dataframe of site level characteristics.

Description

Given a 2-level model, generate data to specifications

Model has site-level covariate W and individual-level covariate X.

Usage

generate_individual_data(
  sdat,
  p = 0.5,
  sigma2.e = 1,
  sigma2.X = 1,
  beta.X = NULL,
  variable.p = FALSE,
  cluster.rand = FALSE,
  sigma2.mean.X = 0,
  num.X = 0 + (!is.null(beta.X)),
  proptx.impact.correlate = FALSE,
  verbose = FALSE
)

generate_multilevel_data_model(
  n.bar = 10,
  J = 30,
  p = 0.5,
  gamma.00,
  gamma.01,
  gamma.10,
  gamma.11,
  tau.00,
  tau.01,
  tau.11,
  sigma2.e,
  sigma2.W = 1,
  beta.X = NULL,
  sigma2.mean.X = 0,
  num.X = 0 + (!is.null(beta.X)),
  num.W = 1,
  variable.n = TRUE,
  variable.p = FALSE,
  site.sizes = NULL,
  cluster.rand = FALSE,
  return.sites = FALSE,
  finite.model = FALSE,
  size.impact.correlate = 0,
  proptx.impact.correlate = 0,
  correlate.strength = 0.75,
  size.ratio = 1/3,
  min.size = 4,
  verbose = FALSE
)

generate_multilevel_data(
  n.bar = 10,
  J = 30,
  p = 0.5,
  tau.11.star = 0.3,
  rho2.0W = 0.1,
  rho2.1W = 0.5,
  ICC = 0.7,
  R2.X = NULL,
  varY0 = 1,
  gamma.00 = 0,
  gamma.10 = 0.2,
  num.X = 0 + (!is.null(R2.X)),
  num.W = 1,
  variable.n = TRUE,
  variable.p = FALSE,
  site.sizes = NULL,
  cluster.rand = FALSE,
  return.sites = FALSE,
  finite.model = FALSE,
  size.impact.correlate = 0,
  proptx.impact.correlate = 0,
  correlate.strength = 0.75,
  size.ratio = 1/3,
  verbose = FALSE,
  zero.corr = FALSE,
  ...
)

generate_multilevel_data_no_cov(
  n.bar = 10,
  J = 30,
  p = 0.5,
  tau.11.star = 0.3,
  ICC = 0.7,
  gamma.00 = 0,
  gamma.10 = 0.2,
  verbose = FALSE,
  variable.n = TRUE,
  control.sd.Y1 = TRUE,
  ...
)

Arguments

sdat

Dataframe of site level characteristics to build full data from. Needs to have site size as a column, called 'n'.

p

prop treated, Default: 0.5

sigma2.e

Residual standard error

beta.X

Coefficient for the individual-level X covariate. NA means no covariate.

variable.p

Should the proportion of units treated in each site vary? Yes/No.

cluster.rand

TRUE means cluster-randomized. FALSE means randomized within site.

sigma2.mean.X

How much the individual-level X covariate means vary across site.

proptx.impact.correlate

Takes values of -1, 0, or 1: Are proportion of units treated negatively correlated, uncorrelated, or positively correlated with site size?

verbose

Say stuff while making data?, Default: FALSE

n.bar

average site size, Default: 10

J

number sites, Default: 30

gamma.00

The mean control outcome, Default: 0

gamma.01

Coefficient for W to site control mean

gamma.10

The ATE, Default: 0.2

gamma.11

Coefficient for W to treatment impact

tau.00

Variance of site control means

tau.01

Covariance of treatment impact and mean site outcome under control

tau.11

Treatment impact variance

sigma2.W

The variation of the site-level covariate.

variable.n

Allow n to vary around n.bar, Default: TRUE

site.sizes

(Optional) vector of manually specified site sizes. If not specified, use n.bar and variable.n to generate site sizes.

return.sites

Return sites, not individual students, Default: FALSE

finite.model

If TRUE use a canonical set of random site effects. When TRUE this method will save the multivariate normal draw and reuse it in subsequent calls to generate_multilevel_data_model until a call with a different J is made. Recommended to use FALSE.

size.impact.correlate

Takes values of -1, 0, or 1: Are site impacts negatively correlated, uncorrelated, or positively correlated with site size?

correlate.strength

In [0,1], and describes how correlated the ranking of site impacts will be with proptx and site size, if they are set to be correlated.

size.ratio

The degree to which the site sizes should vary, if they should vary.

min.size

Smallest site size. Default of 4 to allow for 2 units in tx and co in smallest sites.

tau.11.star

Total amount of cross site treatment variation

rho2.0W

Explanatory power (like a R2 measure) of W for control outcomes, Default: 0.1

rho2.1W

Explanatory power (like a R2 measure) of W for average treatment impact, Default: 0.5

ICC

The ICC, Default: 0.7

varY0

The variance of the control-side potential outcomes in the superpopulation DGP.

zero.corr

TRUE means treatment impact and mean site outcome are not correlated. TRUE means they are negatively correlated to make the variance of the treatment group 1, Default: FALSE

...

Further parameters passed to generate_multilevel_data_model()

control.sd.Y1

Make correlation of random intercept and random slope

Value

Dataframe of individual level data (unless return.sites=TRUE, in which case only site level stuff is returned). Dataframe has treatment column, outcome column, covariates, and block IDs.

Functions

  • generate_individual_data(): Part of data generation that generates individual level covariates. Takes a school-level dataset and returns individual level dataset.

  • generate_multilevel_data(): Wrapper for generate_multilevel_data_model that rescales parameters to make standardization easier.

  • generate_multilevel_data_no_cov(): Simplified version of generate_multilevel_data() with no W covariate.


lmiratrix/blkvar documentation built on Nov. 18, 2024, 1:27 p.m.