knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "100%" ) ggplot2::theme_set(ggplot2::theme_bw()) set.seed(8675309)
library(ggplot2) library(dplyr) library(tidyr) library(faux)
The rnorm_multi()
function makes multiple normally distributed vectors with specified parameters and relationships.
For example, the following creates a sample that has 100 observations of 3 variables, drawn from a population where A has a mean of 0 and SD of 1, while B and C have means of 20 and SDs of 5. A correlates with B and C with r = 0.5, and B and C correlate with r = 0.25.
dat <- rnorm_multi(n = 100, mu = c(0, 20, 20), sd = c(1, 5, 5), r = c(0.5, 0.5, 0.25), varnames = c("A", "B", "C"), empirical = FALSE)
r get_params(dat) %>% knitr::kable()
Table: Sample stats
You can specify the correlations in one of four ways:
If you want all the pairs to have the same correlation, just specify a single number.
bvn <- rnorm_multi(100, 5, 0, 1, .3, varnames = letters[1:5])
r get_params(bvn) %>% knitr::kable()
Table: Sample stats from a single rho
If you already have a correlation matrix, such as the output of cor()
, you can specify the simulated data with that.
cmat <- cor(iris[,1:4]) bvn <- rnorm_multi(100, 4, 0, 1, cmat, varnames = colnames(cmat))
r get_params(bvn) %>% knitr::kable()
Table: Sample stats from a correlation matrix
You can specify your correlation matrix by hand as a vars*vars length vector, which will include the correlations of 1 down the diagonal.
cmat <- c(1, .3, .5, .3, 1, 0, .5, 0, 1) bvn <- rnorm_multi(100, 3, 0, 1, cmat, varnames = c("first", "second", "third"))
r get_params(bvn) %>% knitr::kable()
Table: Sample stats from a vars*vars vector
You can specify your correlation matrix by hand as a vars*(vars-1)/2 length vector, skipping the diagonal and lower left duplicate values.
rho1_2 <- .3 rho1_3 <- .5 rho1_4 <- .5 rho2_3 <- .2 rho2_4 <- 0 rho3_4 <- -.3 cmat <- c(rho1_2, rho1_3, rho1_4, rho2_3, rho2_4, rho3_4) bvn <- rnorm_multi(100, 4, 0, 1, cmat, varnames = letters[1:4])
r get_params(bvn) %>% knitr::kable()
Table: Sample stats from a (vars*(vars-1)/2) vector
If you want your samples to have the exact correlations, means, and SDs you entered, set empirical
to TRUE.
bvn <- rnorm_multi(100, 5, 0, 1, .3, varnames = letters[1:5], empirical = T)
r get_params(bvn) %>% knitr::kable()
Table: Sample stats with empirical = TRUE
Us rnorm_pre()
to create a vector with a specified correlation to one or more pre-existing variables. The following code creates a new column called B
with a mean of 10, SD of 2 and a correlation of r = 0.5 to the A
column.
dat <- rnorm_multi(varnames = "A") %>% mutate(B = rnorm_pre(A, mu = 10, sd = 2, r = 0.5))
get_params(dat) %>% knitr::kable(digits = 3)
Set empirical = TRUE
to return a vector with the exact specified parameters.
dat$C <- rnorm_pre(dat$A, mu = 10, sd = 2, r = 0.5, empirical = TRUE)
get_params(dat) %>% knitr::kable(digits = 3)
You can also specify correlations to more than one vector by setting the first argument to a data frame containing only the continuous columns and r to the correlation with each column.
dat$D <- rnorm_pre(dat, r = c(.1, .2, .3), empirical = TRUE)
get_params(dat) %>% knitr::kable(digits = 3)
Not all correlation patterns are possible, so you'll get an error message if the correlations you ask for are impossible.
dat$E <- rnorm_pre(dat, r = .9)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.