draw_normal_icc: Draw normal data with fixed intra-cluster correlation. In DeclareDesign/fabricatr: Imagine Your Data Before You Collect It

Description

Data is generated to ensure inter-cluster correlation 0, intra-cluster correlation in expectation ICC. The data generating process used in this function is specified at the following URL: https://stats.stackexchange.com/questions/263451/create-synthetic-data-with-a-given-intraclass-correlation-coefficient-icc

Usage

 ```1 2``` ```draw_normal_icc(mean = 0, N = NULL, clusters, sd = NULL, sd_between = NULL, total_sd = NULL, ICC = NULL) ```

Arguments

 `mean` A number or vector of numbers, one mean per cluster. If none is provided, will default to 0. `N` (Optional) A number indicating the number of observations to be generated. Must be equal to length(clusters) if provided. `clusters` A vector of factors or items that can be coerced to clusters; the length will determine the length of the generated data. `sd` A number or vector of numbers, indicating the standard deviation of each cluster's error terms – standard deviation within a cluster (default 1) `sd_between` A number or vector of numbers, indicating the standard deviation between clusters. `total_sd` A number indicating the total sd of the resulting variable. May only be specified if ICC is specified and `sd` and `sd_between` are not. `ICC` A number indicating the desired ICC.

Details

The typical use for this function is for a user to provide an `ICC` and, optionally, a set of within-cluster standard deviations, `sd`. If the user does not provide `sd`, the default value is 1. These arguments imply a fixed between-cluster standard deviation.

An alternate mode for the function is to provide between-cluster standard deviations, `sd_between`, and an `ICC`. These arguments imply a fixed within-cluster standard deviation.

If users provide all three of `ICC`, `sd_between`, and `sd`, the function will warn the user and use the provided standard deviations for generating the data.

Value

A vector of numbers corresponding to the observations from the supplied cluster IDs.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```# Divide observations into clusters clusters = rep(1:5, 10) # Default: unit variance within each cluster draw_normal_icc(clusters = clusters, ICC = 0.5) # Alternatively, you can specify characteristics: draw_normal_icc(mean = 10, clusters = clusters, sd = 3, ICC = 0.3) # Can specify between-cluster standard deviation instead: draw_normal_icc(clusters = clusters, sd_between = 4, ICC = 0.2) # Can specify total SD instead: total_sd_draw = draw_normal_icc(clusters = clusters, ICC = 0.5, total_sd = 3) sd(total_sd_draw) # Verify that ICC generated is accurate corr_draw = draw_normal_icc(clusters = clusters, ICC = 0.4) summary(lm(corr_draw ~ as.factor(clusters)))\$r.squared ```

DeclareDesign/fabricatr documentation built on May 6, 2019, 1:57 p.m.