Simulate Correlated Variables
In faux: Simulation for Factorial Designs

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)
ggplot2::theme_set(ggplot2::theme_bw())
set.seed(8675309)

library(ggplot2)
library(dplyr)
library(tidyr)
library(faux)

The rnorm_multi() function makes multiple normally distributed vectors with specified parameters and relationships.

Quick example

For example, the following creates a sample that has 100 observations of 3 variables, drawn from a population where A has a mean of 0 and SD of 1, while B and C have means of 20 and SDs of 5. A correlates with B and C with r = 0.5, and B and C correlate with r = 0.25.

dat <- rnorm_multi(n = 100, 
                  mu = c(0, 20, 20),
                  sd = c(1, 5, 5),
                  r = c(0.5, 0.5, 0.25), 
                  varnames = c("A", "B", "C"),
                  empirical = FALSE)

r get_params(dat) %>% knitr::kable() Table: Sample stats

Specify correlations {#spec_r}

You can specify the correlations in one of four ways:

A single r for all pairs
A vars by vars matrix
A vars*vars length vector
A vars*(vars-1)/2 length vector

One Number

If you want all the pairs to have the same correlation, just specify a single number.

bvn <- rnorm_multi(100, 5, 0, 1, .3, varnames = letters[1:5])

r get_params(bvn) %>% knitr::kable() Table: Sample stats from a single rho

Matrix

If you already have a correlation matrix, such as the output of cor(), you can specify the simulated data with that.

cmat <- cor(iris[,1:4])
bvn <- rnorm_multi(100, 4, 0, 1, cmat, 
                  varnames = colnames(cmat))

r get_params(bvn) %>% knitr::kable() Table: Sample stats from a correlation matrix

Vector (vars*vars)

You can specify your correlation matrix by hand as a vars*vars length vector, which will include the correlations of 1 down the diagonal.

cmat <- c(1, .3, .5,
          .3, 1, 0,
          .5, 0, 1)
bvn <- rnorm_multi(100, 3, 0, 1, cmat, 
                  varnames = c("first", "second", "third"))

r get_params(bvn) %>% knitr::kable() Table: Sample stats from a vars*vars vector

Vector (vars*(vars-1)/2)

You can specify your correlation matrix by hand as a vars*(vars-1)/2 length vector, skipping the diagonal and lower left duplicate values.

rho1_2 <- .3
rho1_3 <- .5
rho1_4 <- .5
rho2_3 <- .2
rho2_4 <- 0
rho3_4 <- -.3
cmat <- c(rho1_2, rho1_3, rho1_4, rho2_3, rho2_4, rho3_4)
bvn <- rnorm_multi(100, 4, 0, 1, cmat, 
                  varnames = letters[1:4])

r get_params(bvn) %>% knitr::kable() Table: Sample stats from a (vars*(vars-1)/2) vector

empirical

If you want your samples to have the exact correlations, means, and SDs you entered, set empirical to TRUE.

bvn <- rnorm_multi(100, 5, 0, 1, .3, 
                  varnames = letters[1:5], 
                  empirical = T)

r get_params(bvn) %>% knitr::kable() Table: Sample stats with empirical = TRUE

Pre-existing variables

Us rnorm_pre() to create a vector with a specified correlation to one or more pre-existing variables. The following code creates a new column called B with a mean of 10, SD of 2 and a correlation of r = 0.5 to the A column.

dat <- rnorm_multi(varnames = "A") %>%
  mutate(B = rnorm_pre(A, mu = 10, sd = 2, r = 0.5))

get_params(dat) %>% knitr::kable(digits = 3)

Set empirical = TRUE to return a vector with the exact specified parameters.

dat$C <- rnorm_pre(dat$A, mu = 10, sd = 2, r = 0.5, empirical = TRUE)

get_params(dat) %>% knitr::kable(digits = 3)

You can also specify correlations to more than one vector by setting the first argument to a data frame containing only the continuous columns and r to the correlation with each column.

dat$D <- rnorm_pre(dat, r = c(.1, .2, .3), empirical = TRUE)

get_params(dat) %>% knitr::kable(digits = 3)

Not all correlation patterns are possible, so you'll get an error message if the correlations you ask for are impossible.

dat$E <- rnorm_pre(dat, r = .9)

Any scripts or data that you put into this service are public.

faux documentation built on April 20, 2023, 9:13 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

faux
Simulation for Factorial Designs

Simulate Correlated Variables
In faux: Simulation for Factorial Designs

Quick example

Specify correlations {#spec_r}

One Number

Matrix

Vector (vars*vars)

Vector (vars*(vars-1)/2)

empirical

Pre-existing variables

Try the faux package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

faux Simulation for Factorial Designs

Simulate Correlated Variables In faux: Simulation for Factorial Designs

Quick example

Specify correlations {#spec_r}

One Number

Matrix

Vector (vars*vars)

Vector (vars*(vars-1)/2)

empirical

Pre-existing variables

Try the faux package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

faux
Simulation for Factorial Designs

Simulate Correlated Variables
In faux: Simulation for Factorial Designs