View source: R/draw_normal_icc.R
draw_normal_icc | R Documentation |
Data is generated to ensure inter-cluster correlation 0, intra-cluster correlation in expectation ICC. The data generating process used in this function is specified at the following URL: https://stats.stackexchange.com/questions/263451/create-synthetic-data-with-a-given-intraclass-correlation-coefficient-icc
draw_normal_icc(
mean = 0,
N = NULL,
clusters,
sd = NULL,
sd_between = NULL,
total_sd = NULL,
ICC = NULL
)
mean |
A number or vector of numbers, one mean per cluster. If none is provided, will default to 0. |
N |
(Optional) A number indicating the number of observations to be generated. Must be equal to length(clusters) if provided. |
clusters |
A vector of factors or items that can be coerced to clusters; the length will determine the length of the generated data. |
sd |
A number or vector of numbers, indicating the standard deviation of each cluster's error terms – standard deviation within a cluster (default 1) |
sd_between |
A number or vector of numbers, indicating the standard deviation between clusters. |
total_sd |
A number indicating the total sd of the resulting variable.
May only be specified if ICC is specified and |
ICC |
A number indicating the desired ICC. |
The typical use for this function is for a user to provide an ICC
and,
optionally, a set of within-cluster standard deviations, sd
. If the
user does not provide sd
, the default value is 1. These arguments
imply a fixed between-cluster standard deviation.
An alternate mode for the function is to provide between-cluster standard
deviations, sd_between
, and an ICC
. These arguments imply
a fixed within-cluster standard deviation.
If users provide all three of ICC
, sd_between
, and
sd
, the function will warn the user and use the provided standard
deviations for generating the data.
A vector of numbers corresponding to the observations from the supplied cluster IDs.
# Divide observations into clusters
clusters = rep(1:5, 10)
# Default: unit variance within each cluster
draw_normal_icc(clusters = clusters, ICC = 0.5)
# Alternatively, you can specify characteristics:
draw_normal_icc(mean = 10, clusters = clusters, sd = 3, ICC = 0.3)
# Can specify between-cluster standard deviation instead:
draw_normal_icc(clusters = clusters, sd_between = 4, ICC = 0.2)
# Can specify total SD instead:
total_sd_draw = draw_normal_icc(clusters = clusters, ICC = 0.5, total_sd = 3)
sd(total_sd_draw)
# Verify that ICC generated is accurate
corr_draw = draw_normal_icc(clusters = clusters, ICC = 0.4)
summary(lm(corr_draw ~ as.factor(clusters)))$r.squared
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.